15 April, 2026 - Last week in TI

Agent Dojo: Expert Knowledge for AI Agents

Agent Dojo provides practitioner-grade expert knowledge to AI coding agents via MCP and REST APIs, helping them make better architectural decisions grounded in real production experience.

Dojo Web App – Create, manage, and query dojos from the web interface. Each dojo ingests domain knowledge (YouTube, docs) and builds a Knowledge base for structured retrieval across multi dimensions.
Add a Dojo – Spin up a new domain-specific dojo directly from the UI — pick a topic, provide sources, and it auto-builds the knowledge base. No CLI or manual pipeline needed.
OpenCode + Dojo Experiment – Tested Dojo with OpenCode (open-source coding agent) on 5 real production scenarios across PostgreSQL, K8s, Redis, DynamoDB. Dojo won 3, tied 2, lost 0 — strongest on problems where the textbook answer has a hidden failure mode.
Cursor Cloud Agent Skill Improvement – Used Dojo to improve Cursor Cloud Agent skills (ab-cursor-dev-setup). Tested improved skills on 3 real issues — 1 tie, 2 where improved skills produced measurably better output.

Try it out

EduLLM Science: High 90s Quality with Image & Stimulus Support

EduLLM Science now generates K-8 science questions, with best score of 99% quality on grades 3 and 8, with grades 1 and 2 close behind at 96–97%. The rollout adds native image support via Gemini Pro and stimulus-based question generation, pushing accuracy well past the mid-90s baseline.

InceptBench Scores – Benchmark testing achieved best score of 99% on grades 3 and 8, with grades 1 and 2 reaching 96–97%. More frequent runs will establish the score with higher confidence as we make further improvements.
Image Support via Gemini Pro – Science questions can now include images, enabling richer, standards-aligned content — a key driver of the grade 3 improvement from mid-90s to 99%.
Stimulus Input – Pass in a stimulus (diagram, passage) and the generator produces questions grounded in that context, supporting more authentic assessments.

Try it out

Athena Applets: Strong Quality Metrics & Multi-Grade Expansion

5 new lessons generated, 1 lesson approved this week. Total review time was 29 minutes (9.7 minutes per iteration), and the current review queue holds 184 lessons.

Production output – 5 new lessons generated and 1 lesson approved this week.
Efficient iteration cycle – 3 total iterations completed in the past 7 days (2 feedback rounds, 1 approval) with an average of 0 comments per lesson (0 total comments).
Multi-grade coverage – Lessons approved in Grade 3 (1 lesson), with a review queue of 184 lessons ready for validation (Grade 3: 74, Grade 3 Supporting: 38, Grade 4: 4, Grade 4 TEKS: 1, Grade 5 TEKS: 1, Grade 6: 60, Grade 6 TEKS: 2, Grade 7: 4).
New lessons by grade – Grade 3 Supporting: 5 lessons.
Review time insights – Average review time of 9.7 minutes per iteration (total 29 minutes).
Complete lesson catalog – See the full list of all uploaded lessons across grades in this lesson catalog with direct links to each lesson.

Try it out

Marauders Map: Tracking Stability, Cross-Camera Dedup & RBAC Email Notifications

Marauders Map upgrades its tracking backbone to BoT-SORT with OSNet ReID for dramatically fewer ID switches, introduces overlap-zone-based deduplication to eliminate ghost duplicates in multi-camera rooms, and integrates AWS SES for automated RBAC lifecycle emails — invite, approve, reject, suspend, and restore.

Tracking Stability via BoT-SORT ReID — Replaced motion-only Hungarian matching with BoT-SORT + OSNet appearance embeddings. ReID recovers associations through occlusions and a capped track buffer prevents stale-track buildup at ~1 Hz SAM3 cadence — fewer ID switches, longer track lifetimes, and cleaner floorplan trajectories.
Duplicate Elimination via Overlap Zones — Admins draw overlap polygons on the floorplan and assign a primary camera per zone. The backend suppresses unrecognised duplicate tracks from secondary cameras while preserving face-ID’d people from either camera. A frontend filter further drops unrecognised markers when a recognised identity is already present in the zone.
AWS SES Integration for RBAC Notifications — Automated lifecycle emails (invite, approve, reject, suspend, restore) via AWS SES with Jinja2 HTML templates. Admins get instant alerts on new sign-ups; pending users see status on each login attempt. Smart guards skip delivery to suspended accounts, and a dev-recipient whitelist enables safe non-production testing.

Learn More

Try it out

EduLLM Math: Improved pass rates & support for Math Grades 7 & 8

EduLLM Math has improved pass rates and added support for Math Grades 7 & 8.

Improved pass rates: Pass rate for Math Grades 1, 2, 3, 4, 5, 6 has improved to cross the 95% mark.
Support for Math Grades 7 & 8: Added support for Math Grades 7 & 8 standards, enabling generation of questions for these grade levels. Currently scoring 94% and 89% on the benchmark respectively.

Try it out

EduLLM Social Studies: Supervised Fine-Tuning Experiment (v1 -> v4p1)

We ran a 4-version supervised fine-tuning experiment for Social Studies and moved Grade 5 pass rate from 89.5% (v1) to 93.8% (v4p1), with stronger support scores and lower inference failures. The latest model sets a new internal best while establishing a clear path toward the 99% benchmark target.

Best model so far (v4p1) – Grade 5 PASS improved to 93.8% (from 89.5% in v1), SUP rose to 90.8% (from 86.2%), and INF dropped to 6.2% (from 10.5%) for a net quality gain across the full evaluation stack.
Question-type improvements – v4p1 outperformed v1 on every tracked type: fill-in +10.0pp, match +7.5pp, sequence +3.3pp, mcq +1.6pp, and msq +0.8pp, showing broad rather than narrow improvement.
Multi-grade validation – v4p1 currently scores PASS rates of 90.8% (G6), 90.7% (G7), and 84.2% (G8), with remaining gap to the 99% target at -8.2pp, -8.3pp, and -14.8pp respectively.
Experiment direction – Compact and thinking-data variants were both tested, and the current best result came from the 40k thinking setup, which will be used as the baseline for the next optimization cycle.

Try it out

EduPaid v2.25.0: Wallet Receipts, TimeBack Apps & Provider API

Parents can download wallet top-up receipts from the success screen and subscription history; providers get a new Apps section for TimeBack learning app registration and connection, a cleaner Manual Discounts view with pagination and full CSV export, and a public provider API for programmatic transaction rescheduling with richer billing data.

Download receipts for wallet top-ups: After a wallet top-up, parents can download a receipt from the confirmation screen; completed cash deposits in subscription history also include a download option on desktop and mobile, with clear feedback when a receipt is not yet available.
TimeBack apps, manual discounts & provider API: Register or connect a TimeBack learning app from the new Apps section (with an admin approval flow); the Manual Discounts tab excludes sibling discounts and adds server-side pagination and reliable paged CSV export; providers can reschedule transactions using their API key (no portal session), and get-student-billing now returns transaction_id on each record for reconciliation.

Learn More

BrainTrust: Finance Agent, Custom Models & More

BrainTrust continues to expand its capabilities this week — bringing powerful accounts payable intelligence via the Finance Agent and giving users direct control over the AI models powering their experts.

A/P Account Reconciliation – The Finance Agent can now fully reconcile Accounts Payable accounts, aging vendors into time buckets so teams can instantly see what’s current, overdue, and by how much.
Group-Level Consolidation – Run analyses across all subsidiaries in one go, giving finance teams a consolidated, bird’s-eye view of the entire group’s payables position without stitching reports together manually.
Prepaid Account Analyses (In Progress) – Upcoming support for prepaid account verification: the agent will cross-check balances against amortisation schedules to surface discrepancies between what should have been paid and what was actually paid.
Intercompany Drill-Downs with Multi-Currency Support (In Progress) – Improving intercompany account drill-downs to correctly handle entities that operate in different base currencies, ensuring accurate cross-entity reconciliation.
Custom Model per Expert – Users can now select and configure the AI model powering their expert, enabling deeper personalisation and fine-tuned performance for each use case.

Try it out

EduLLM ELA: Age-Appropriate Language Across K-12

Every generated question now uses vocabulary, sentence structure, and reading complexity matched to the target grade — ensuring assessments measure what they’re supposed to, not whether a student can decode language above their level.

Grade-matched language – A 1st grader sees simple words and short sentences; a 12th grader gets nuanced, challenging language. Every question meets the student where they are.
Why it matters – A question testing the right standard but written above a student’s reading level measures decoding, not comprehension. Age-appropriate language means fairer, more accurate assessments and fewer false negatives across every grade.

Try it out

15 April, 2026 - Last week in TI

Agent Dojo: Expert Knowledge for AI Agents

EduLLM Science: High 90s Quality with Image & Stimulus Support

Athena Applets: Strong Quality Metrics & Multi-Grade Expansion

Marauders Map: Tracking Stability, Cross-Camera Dedup & RBAC Email Notifications

EduLLM Math: Improved pass rates & support for Math Grades 7 & 8

EduLLM Social Studies: Supervised Fine-Tuning Experiment (v1 -> v4p1)

EduPaid v2.25.0: Wallet Receipts, TimeBack Apps & Provider API

BrainTrust: Finance Agent, Custom Models & More

EduLLM ELA: Age-Appropriate Language Across K-12

Tags :

Share :

Related Posts

01 April, 2026 - Last week in TI

18 March, 2026 - Last week in TI

11 March, 2026 - Last week in TI