Scorpio LogoScorpio Docs

Scorpio Research

Investigating the effectiveness of constraint-based AI tutoring systems. Our ablation study demonstrates how a layered architecture of inference-time rules can transform a general-purpose LLM into a specialized Socratic tutor.

Key Findings

Constraint Effectiveness

High

Modular constraint stack enforces 100% domain adherence and notation accuracy (LaTeX density 0.92)

Eliminates off-topic and poorly formatted responses

Direct Answer Prevention

High

Direct Answer Rate (DAR) reduced from 100% (NONE) to 0% (FULL)

Forces productive struggle and guided reasoning

Socratic Engagement

High

FULL stack achieves 1.16 questions per response (vs. 0.32 for DOMAIN ONLY)

Significant increase in inquiry-based interaction

Pedagogical Quality

Medium

Quality scores remain high and consistent (3.92/5 FULL, 3.96/5 NONE)

Reliable teaching effectiveness across all tiers

System Performance

Constraint LevelDescriptionDomain Adh.DARLaTeX %Avg Q'sQuality
NONEBaseline Gemini 2.5 Flash, no constraints0.0%100%0.221.083.96
DOMAINPhysics domain restriction only100.0%100%0.350.323.98
PEDAGOGYDomain + response classification100.0%0.0%0.280.843.86
NOTATIONDomain + pedagogy + LaTeX/unit enforcement100.0%0.0%0.881.044.02
FULLComplete Socratic tutoring stack100.0%0.0%0.921.163.92

Performance by Difficulty

DifficultyQualityRule Adherence %Avg Length (Chars)
Basic3.7277.8%399
Intermediate4.05100.0%641
Advanced4.13100.0%1578
College3.75100.0%3322

Methodology

CategoryCount/DetailsBreakdown
Question Types28 totalConceptual, Procedural, Adversarial
Difficulty Levels4 levelsBasic (8), Intermediate (10), Advanced (6), College (4)
Constraint Levels5 configurationsNONE, DOMAIN, PEDAGOGY, NOTATION, FULL
Metrics CollectedDirect Answer Rate, LaTeX Density, Question Density, Domain Adherence, Pedagogical Quality
Sample Size140 responses28 questions × 5 constraint levels
AI ModelGemini 2.5 FlashLightweight model, inference-time constraints