29 April, 2026 - Last week in TI

29 April, 2026 - Last week in TI

feature image

Alpha Coach Analysis Platform: Rebuilt UI & Analytics Dashboard

Coaches and admins get a fully redesigned interface with subject and campus breakdowns, cleaner session views, and new analytics showing call performance trends across the entire team.

  • Full UI Redesign – Every major screen rebuilt to match Alpha’s design system: app shell, sidebar, session list, recording detail, coach profile, team board, tracker, upload, and insights pages — all faster and cleaner.
  • Insights Dashboard – New analytics show calls broken down by subject, campus, and app, with date-range pills (7D / 30D / 90D / All-time) and daily/weekly/monthly trend views with hover tooltips.
feature image

InceptBench v2.5.3: Bug Fixes & Benchmark Refresh

InceptBench 2.5.3 ships targeted bug fixes alongside a ground-up rebuild of the benchmark pipeline — making it easier to add new sets, version snapshots cleanly, and run evaluations at higher concurrency across an expanded curriculum.

  • v2.5.3 Bug Fixes – Fixed a bug where factual accuracy was incorrectly running for all subjects instead of only the applicable ones, and addressed feedback from Subject Matter Expert on the AE Studio run.
  • Adding Benchmarks is Now 2 Minutes of Work – We eliminated over 110,000 benchmark requests by moving to versioned snapshots. Adding a new benchmark set used to be a heavy, time-consuming process — now it’s just a quick config entry and a script run, and everything else picks it up automatically. Old runs stay consistent with no drift.
  • Higher Concurrency & Global Queue – Increased concurrency limits and introduced a global queue for smoother, faster evaluation runs.
  • Expanded Curricula – Benchmark coverage has grown significantly across standards. Common Core now includes Math, Language, Reading, Science, and Social Studies — with Science extending all the way from Kindergarten through Grade 8, plus High School Biology Honors and Physics. SAT coverage spans both Math and Reading & Writing, covering the full PSAT-to-SAT range.
  • Interests Sets Added – New interest-tagged benchmark sets for CC/Science and CC/Social Studies.

EduLLM Science: 99% Agentic on K–8, SFT at 95–98%

Agentic mode now hits 99% on K–8 except Kindergarten and Grade 6 — measured without interests, which still cost about 2–3% pass rate; we are working to match this with interests on. High school curriculum needs another review (lighter standards, no chemistry yet). SFT runs land at 95–98% as we explore the next step.

  • 99% agentic on K–8 – Top scores across the band except K and Grade 6; reported without interests for now.
  • Interests – Adding interests currently drops pass rate ~2–3%; closing that gap is in progress.
  • High school – Curriculum to be reviewed again; fewer standards than K–8, chemistry not in place.
  • SFT – Models scoring 95–98% across runs; further improvements under exploration.
feature image

EduLLM Social Studies: Interests in the Generator & SFT Plateau on Grade 5

The Social Studies generator now supports Interest as a category alongside Questions, so runs can be evaluated with interest-tagged curriculum. We see a 2–3% pass-rate regression when interests are on—consistent with other subjects and an area we are closing. On SFT, Grade 5 quality has plateaued around 97%; the best-performing setup so far uses Mistral Small 24B with 75 items per request in the training dataset. A full Incept-Social-2 run for CC / Grade 7 / Social Studies with Questions + Interest (540 items, zero errors) landed 98.4% aggregate score, 95.9% pass rate, and 94.2% variety, with 38m 35s total duration and ~22s average generation latency.

  • Interests in the generator – Interest is supported in the generation pipeline for Social Studies; enabling it is essential for grounded, student-relevant items but currently costs about 2–3 percentage points in pass rate versus interest-off baselines.
  • SFT: Grade 5 at ~97% – Supervised fine-tuning for Grade 5 has plateaued near 97%; next gains likely need data mix, model, or batching changes rather than more of the same recipe.
  • Best SFT recipe so farMistral Small 24B with 75 items per request in the dataset produced the strongest results in our recent sweep.
  • Sample dashboard run (G7 + Interest)540 / 540 generated and evaluated, no errors, 98.4% aggregate, 95.9% pass, 94.2% variety (see screenshot).
feature image

Marauders Map: Cinematic Playback Engine — Immersive Video-Style Replays of Campus Activity

Marauders Map introduces a Cinematic Playback Engine that transforms raw position data into polished, video-style replays of campus activity — letting admins rewatch an entire school day, follow a specific student’s journey, or inspect any custom time window for any room, all from an immersive full-screen player with a glass-morphism UI.

  • School Overview Playback — A 120-second cinematic replay of the entire school day. A script-driven camera system pans across the floorplan through timed stages — title card, per-room deep-dives with peak-activity overlays showing occupancy counts and recognised students, an animated summary grid, and a closing card — with smooth lerped camera transitions and 3D animated glowing dots representing every person detected.
  • Student-Specific Playback — Select any recognised student and watch their personal 60–120 second replay. The camera follows the student across rooms, showing where they spent time, who they interacted with, and their movement patterns — presented as a focused, narrated story with room detail cards rather than raw tracking data.
  • Custom Playback Generation — Pick any date, time range, and scope (entire school or a specific room) from custom-styled date/time selectors. The backend generates a replay dataset asynchronously — stored in S3, tracked in DynamoDB — with real-time progress streaming via SSE. Completed playbacks persist across sessions and can be replayed anytime or deleted with full S3 cleanup.
feature image

Athena Applets: Strong Quality Metrics & Multi-Grade Expansion

2 new lessons generated, 4 lessons approved this week. Total review time was 145 minutes (9.1 minutes per iteration), and the current review queue holds 127 lessons.

  • Production output – 2 new lessons generated and 4 lessons approved this week.
  • Efficient iteration cycle – 16 total iterations completed in the past 7 days (12 feedback rounds, 4 approvals) with an average of 3.0 comments per lesson (48 total comments).
  • Multi-grade coverage – Lessons approved in Grade 3 (4 lessons), with a review queue of 127 lessons ready for validation (Grade 3: 19, Grade 3 Supporting: 37, Grade 4: 4, Grade 5 TEKS: 1, Grade 6: 60, Grade 6 TEKS: 2, Grade 7: 4).
  • New lessons by grade – Grade 3 Supporting: 2 lessons.
  • Review time insights – Average review time of 9.1 minutes per iteration (total 145 minutes).
  • Complete lesson catalog – See the full list of all uploaded lessons across grades in this lesson catalog with direct links to each lesson.
feature image

EduLLM SAT (Reading and Writing): Articles & MCQ Generator

This week we deployed the static articles to GCS, ran InceptBench evaluations at 99%+ aggregate with 100% pass rate across all three grades, and developed the MCQ question generator with local testing.

  • Static Articles on GCS – All 33 articles are now on a public GCS bucket and sync on every merge, so downstream services can use them directly.
  • InceptBench Results – Digital SAT: 99.3%, Digital PSAT/NMSQT and PSAT 10: 99.6%, Digital PSAT 8-9: 99.6%, all at 100% pass rate.
  • MCQ Question Generator – We developed the generator this week and tested it locally. Pass rates per grade: Digital SAT: 85%, Digital PSAT 8-9: 88%, Digital PSAT/NMSQT and PSAT 10: 88%. Next step is to deploy and benchmark on InceptBench.

BrainTrust: Prepaid Reconciliation, Month-End Automation & Smarter Response Controls

BrainTrust takes a big step forward in operational reliability and user experience this week — with prepaid account processing now live in production, month-end reporting automation on the way, and new controls that let users cancel and retry AI responses without losing their work.

  • Prepaid Account Processing – Prepaid reconciliation is now live in production. Users can now reconcile subsidiaries that include prepaid accounts.
  • Month-End Report Automation (In Progress) – We’re building automated month-end processing reports to eliminate manual reporting overhead and ensure timely, accurate financial summaries every month.
  • Cancel & Retry Responses – Users can now cancel an in-progress AI response and retry it in both Spaces and DMs, saving time and reducing friction when a task needs to be re-run.
feature image

EduLLM SAT Math: PSAT 8–9 Articles & Question Generation

PSAT 8–9 Articles reached 100% pass rate and 98.3% aggregate score on a full evaluated run. The question generation pipeline is wired up but has not run the benchmark set yet; local runs so far show ~94% pass rate and ~93% aggregate score.

  • PSAT 8–9 Articles100.0% pass rate and 98.3% aggregate score with all items generated and evaluated.
  • Question generation pipeline – End-to-end pipeline ready; benchmark set not run yet. Local validation: ~94% pass rate and ~93% aggregate score.
feature image

EduLLM Math: Improved pass rates and SFT model

EduLLM Math has improved pass rates for Math Grades 4-7, and got a baseline SFT model for grade 4.

  • Improved pass rates: Pass rate for Math Grades 4-7 has improved to 99.64%, 97.11%, 98.70% and 98.25%.
  • SFT: SFT model for grade 4 hit 85.70% pass rate.
feature image

EduPaid v2.26.1: Markets, AlphaAnywhere & Provider Portal Enhancements

GTAnywhere is onboarded on EduPaid with student and parent data migration still ahead; the AlphaAnywhere summer program is ESA-approved for fund submission; Texas approval sets a July marketplace opening for parents; the AlphaAnywhere product is available for purchase in West Virginia. The release also adds self-serve static learning tracks with curriculum drill-down, commitment plans with custom-day schedules and upfront billing, and clearer invited-user status on the Users tab.

  • Programs & markets: GTAnywhere is live on EduPaid; migration of students and parents is still pending.
  • AlphaAnywhere summer program is approved in ESA, so parents can start sending ESA funds. EduPaid is approved in Texas—the marketplace opens for parents in July.
  • The AlphaAnywhere product is available in West Virginia, where parents can purchase it.
  • Provider portal (v2.26.1): Self-serve static learning tracks replace the old support-only path—browse CF documents, courses, and curriculum trees with lazy drill-down and labeled selection chips.
  • Commitment plans now support custom-day intervals and work with upfront billing where your program uses that model, with validation and help text for duration and frequency.
  • On Users, teammates invited but not yet signed in show an Invited badge so pending access is obvious.
Share :