
22 April, 2026 - Last week in TI
- Rahul Subramaniam
- Releases
- April 22, 2026

Agent Dojo → Pivoting to Intelligent Tool Routing
After validating that LLMs are already trained on YouTube content — making Dojo’s original ‘YouTube knowledge for agents’ thesis no longer defensible — we pivoted to solving a harder, unsolved problem: intelligent tool routing for AI agents.
- Value Prop Research – Ran experiments comparing Dojo vs Gemini + same YouTube videos on the same questions. Content overlap mostly. LLMs like Gemini are already trained on YouTube content (Google confirmed), so the original thesis no longer holds. Independent deep research from both Gemini and ChatGPT reached the same conclusion. Full research report here.
- Market Gap Analysis – Researched all common agent pain points (memory, knowledge, observability, guardrails). Most are crowded with lot of tools. Tool routing — specifically intelligent routing with continuous learning — is the one gap nobody has solved at production scale.
- The Problem – Agents with lot of tools see accuracy drop and context window bloats up when all tools are loaded into context.
- The Solution – An MCP server that sits between the agent and all its tools. Agent calls one tool, and it figures out which MCP server has the right tool, executes it, and returns the result. Since it executes the tools, it auto-captures real telemetry (latency, success/failure, output quality) on every call — feeding a continuous learning loop that tracks tool reliability, learns user preferences, and gets smarter the more agents use it.

EduPaid v2.26.0: Learning Track Creation & Custom Billing Intervals
Providers can self-serve new learning tracks from the portal with TimeBack app linkage and optional dynamic CF Subject selection, and standard subscriptions now support a Custom days billing rhythm for flexible, predictable charge schedules. Commitment plans still use the existing frequencies only.
- Create learning tracks from the portal: Open the guided create flow from Learning Tracks, link an approved TimeBack app, name the track, and mark it dynamic for subject-based scheduling; dynamic tracks include a searchable CF Subject picker from TimeBack, while non-dynamic tracks route through the EduPaid team with clear messaging in the flow.
- Custom billing intervals (every N days): Custom days join monthly, quarterly, and annual in Pricing Terms—set amount and days between charges for calendar-day advance scheduling; validation helps catch invalid combinations before save.
EduLLM Science: Grade 1 Reaches 99% in Agentic Mode, SFT Mode introduced
EduLLM Science reached 99% quality on Grade 1 in agentic mode this week, extending our highest-quality performance to one more grade. In parallel, we built and evaluated SFT-based models that crossed 90%+ quality, though recent refinements have produced limited gains and some regression, so we are continuing to explore how to improve them further.
- Grade 1 at 99% via Agentic Mode – We achieved 99% benchmark quality on Grade 1 through the agentic pipeline, showing that the workflow can now reach top-tier performance on another grade.
- SFT Models at 90%+ – We trained and tested several SFT variants, with the best runs surpassing 90% quality, while later refinements showed limited improvement or regression.
- Two-Track Improvement Plan – We are continuing to push agentic-mode gains across more grades while exploring new ways to improve SFT performance further.

EduLLM Social Studies: SFT Pipeline and InceptBench Results
We are now running a supervised fine-tuning (SFT) workflow for Social Studies that combines curated curriculum-aligned training data with benchmark-led iteration. The latest Incept-Social-SFT run (360/360 items generated and evaluated) delivered a 97.4 aggregate score with 93.9% pass rate and 82.9% variety score.
- SFT approach in production iteration – We train Social Studies models on curated question-generation data, evaluate on InceptBench, and use run-level diagnostics (pass, variety, latency, and failure checks) to decide the next tuning cycle.
- Latest benchmark outcome (Grade 5 run) – The completed run processed 360 items end-to-end with 0 errors, reached 97.4 aggregate score, 93.9% pass rate, and 82.9% variety score, with average generation latency of 8.78s.
- New Articles support on InceptBench – Social Studies now supports Articles as benchmark input, enabling article-grounded quality evaluation in addition to standard prompt-only flows.
- Next-week dashboard rollout – By next week, the dashboard will support both static articles and articles + interests for Social Studies across all grades, so teams can track grounded generation quality in both modes.

Athena Applets: Strong Quality Metrics & Multi-Grade Expansion
1 new lesson generated, 2 lessons approved this week. Total review time was 162.1 minutes (9.0 minutes per iteration), and the current review queue holds 152 lessons.
- Production output – 1 new lesson generated and 2 lessons approved this week.
- Efficient iteration cycle – 16 total iterations completed in the past 7 days (14 feedback rounds, 2 approvals) with an average of 5.6 comments per lesson (89 total comments).
- Multi-grade coverage – Lessons approved in Grade 3 (1 lesson) and Grade 4 TEKS (1 lesson), with a review queue of 152 lessons ready for validation (Grade 3: 46, Grade 3 Supporting: 35, Grade 4: 4, Grade 5 TEKS: 1, Grade 6: 60, Grade 6 TEKS: 2, Grade 7: 4).
- New lessons by grade – Grade 3 Supporting: 1 lesson.
- Review time insights – Average review time of 9.0 minutes per iteration (total 162.1 minutes).
- Complete lesson catalog – See the full list of all uploaded lessons across grades in this lesson catalog with direct links to each lesson.

BrainTrust: Documentation, Finance Intelligence & QuickBooks Improvements
BrainTrust now has dedicated documentation to help users get started faster, and the Finance Agent has gained new capabilities for prepaid tracking, intercompany reconciliation, and cross-company QuickBooks analysis.
- Documentation Launch – Comprehensive docs are now live at docs.braintrust.ti.trilogy.com, covering onboarding walkthroughs and feature guides so users can hit the ground running without needing to reach out for support.
- Prepaid Account Drill-Down – The Finance Agent can now dig into prepaid accounts and compare current payment status against the amortization schedule, so you can immediately see whether payments are on track or falling behind.
- Intercompany Matching & Simplified Entity Flagging – Improved handling of multi-currency transactions in intercompany accounts reduces reconciliation friction across entities, and the agent now automatically flags simplified entities for easier review.
- QuickBooks Cross-Company Querying – Improved QuickBooks tools let the agent efficiently query across all 34 companies in one go, making it easy to analyze and compare data at the portfolio level.

Marauders Map: Safety Alert Engine & Emergency Response Dashboard
Marauders Map now monitors campus safety in real time — a configurable alert engine detects restricted-zone breaches, tracks prolonged absences, and escalates to wellness checks, while a full Emergency Response Dashboard provides live accountability during evacuations or lockdowns with one-click PDF rescue reports for first responders.
- Geofence Alerts — Any zone can be marked as restricted. When the vision pipeline recognises a known person inside a restricted zone, a high-severity alert fires instantly and streams to every connected admin via Server-Sent Events. Per-person cooldowns prevent duplicate noise, and the system re-evaluates automatically whenever a zone’s restriction status changes.
- Absence & Wellness Check Escalation — The orchestrator continuously tracks how long each enrolled person has been unseen. When someone exceeds the configured absence threshold, an absence alert fires. If consecutive absence alerts cross a second configurable threshold, the system creates a new wellness check or updates the timestamp on an existing pending one — with auto-resolution the moment the person is detected again.
- Live Emergency Accountability Dashboard — Activating an evacuation or lockdown opens a dedicated full-screen dashboard with a live elapsed timer, headcount summary cards (enrolled, accounted, missing — broken down by role), an interactive colour-coded floorplan showing room status per floor, progress bars tracking rooms cleared and persons located, and a missing-persons panel with last known room, time since last seen, and one-click ‘Mark Safe’ overrides — all refreshed every 10 seconds via snapshot polling and real-time SSE events.
- PDF Rescue Report & Audit Trail — A single click generates a structured, multi-section PDF for emergency responders: metadata header, summary table, a red-highlighted priority table of missing persons with last known locations, room-wise occupancy with colour-coded status and nearest-exit assignments, full personnel rosters grouped by role, and a floor-level summary — timestamped and ready to hand off. Every status override is logged in an immutable event log, creating a complete auditable record from activation to resolution.

InceptBench v2.4.3: Audio/Video, Interests, Universal Renderer & More
InceptBench 2.4.3 ships a wave of new capabilities — audio/video question support, interest and stimulus-based evaluation, sequence and match question types, Wikipedia-backed factual checks for Social Studies, and a full migration to BAML that eliminates malformed-JSON errors for good.
- Audio & Video Support – InceptBench now evaluates audio and video question types end-to-end, expanding benchmark coverage beyond text and images.
- Interest & Stimulus Support – Evaluation pipelines now handle interest-tagged and stimulus-based content, enabling richer, context-grounded benchmarking.
- Fixed Image Evaluations – Resolved incorrect image evaluation results.
- Curriculum Gap Coverage via Sherlock – Used Sherlock to identify and fill curriculum gaps; SAT coverage is complete and rollout to other subjects is in progress this week.
- New Datasets with Interests – Added new benchmark datasets that include interest signals, enabling evaluation of interest-aware generation.
- Migrated to BAML – Replaced ad-hoc JSON parsing with BAML across the evaluation pipeline, eliminating the class of malformed-JSON errors that were causing silent failures.
- Sequence & Match Question Types – Added native support for sequence-type and match-type questions in InceptBench.
- Wikipedia-Based Factual Check for Social Studies – Integrated Wikipedia as a factual grounding source for Social Studies evaluations, improving answer-correctness verification.

EduLLM Math: Improved pass rates
EduLLM Math has improved pass rates for Math Grades 1-3 with SFT hitting 90%.
- Improved pass rates: Pass rate for Math Grades 1-3 has improved from 95.49 -> 97.57, 95.96 -> 98.27 and 95.53 -> 97.56.
- SFT: SFT model for grade 3 hit 90% pass rate.

EduLLM SAT Math: Articles Generation Baseline
First Articles generation run for EduLLM SAT Math is live, with a 93.9% aggregate score and 75.0% pass rate across 8 evaluated items.
- Aggregate score – 93.9% across the evaluated set.
- Pass rate – 75.0% at the >=85% threshold (failed generations count as 0).
EduLLM SAT (Reading and Writing)
EduLLM-SAT-RW is a powerful SAT Reading and Writing content generator that generates SAT-style questions for preparation across multiple question types.
- Static Articles for All 33 Standards – Added static articles to all 33 standards for SAT-RW that now score 95% on InceptBench, significantly improving content quality and assessment accuracy.
- Image Generation Component – Added an image generation component to dynamically generate images for questions and passages as needed. This component was also used to add article images for the 33 standards, enriching the learning experience with relevant visual content.