Research & Methodology
Scorpio: A Verifiable Framework for Enforcing Socratic Scaffolding in Physics LLMs Beyond Fine-Tuning
Investigating the shift from "Answer Engines" to "Socratic Scaffolding". Our research formalizes a four-layer architecture to structure verifiable AI behavior at inference-time, achieving a 0% Direct Answer Rate and significant pedagogical gains.
Verifiable Framework: The 4-Layer Architecture
Domain
Enforces physics context
Refusal logic
Pedagogical
Classifies student intent
Scaffolding mode
Notation
Enforces LaTeX & Units
Scientific syntax
Socratic
Elicits student reasoning
Rule validation
Mathematical Fidelity & Notation
Scorpio uses a custom-built LaTeX engine designed for physics pedagogy. From complex integrals to 4-vector notation, our interface ensures symbols are rendered with publication-grade precision.
Key Features
- Intuitive Math Builder UI
- Real-time KaTeX Syntax Validation
- Waypoints Reference System
- Dynamic Preview & Correction
Cost Transparency & Scaling
We charge zero markup on AI costs. Most EdTech companies mark up API costs 300–500%. We charge you exactly what Google charges us.
Always-on AI Tutor, Department Waypoints, and Mastery Analytics come standard with every organizational license.
$0.15/1M input and $0.60/1M output tokens — fixed at Google DeepMind Gemini 2.5 Flash rates.
District-wide deployment with custom SSO, dedicated infrastructure, and tiered volume pricing.
Investor · Admin · Pitch
Total Monthly Cost Comparison:
Our AI vs Industry Education AI
Flat $4.99 network fee vs per-student pricing ($5–$12/student)
| Students | Scorpio Total | Industry Low | Industry High | Our $/Student | Industry $/Student | Savings vs Low |
|---|---|---|---|---|---|---|
| 10 | $5.64 | $50 | $120 | $0.56 | $5.00 – $12.00 | 89% cheaper |
| 50 | $8.23 | $250 | $600 | $0.16 | $5.00 – $12.00 | 97% cheaper |
| 100 | $11.47 | $500 | $1200 | $0.11 | $5.00 – $12.00 | 98% cheaper |
| 250 | $21.19 | $1250 | $3000 | $0.08 | $5.00 – $12.00 | 98% cheaper |
98.3%
Max Cost Reduction
at 250 students vs industry low
$11.47
Our Cost at 100 Students
industry charges $500 – $1,200
$0.08
Effective Per-Student Rate
at 250 students total
Key Findings
Direct Answer Rate (DAR)
CriticalReduced from 100% (NONE) to 0% (FULL) in all procedural physics problems
Eliminates answer-harvesting and forces productive struggle
Expert Validation
HighIndependent Ph.D. audit confirmed a +0.67 point gain in pedagogical quality for the FULL stack
Statistically validates framework efficacy beyond self-assessment
Notation Accuracy
HighLaTeX mathematical density peaked at 0.92 per 100 words with explicit notation constraints
Ensures professional academic standards and symbolic clarity
Socratic Engagement
MediumFULL stack achieved 1.25 questions per response (vs. 0.50 for DOMAIN ONLY)
Significant increase in inquiry-based student interaction
System Performance
| Constraint Level | Description | Domain Adh. | DAR | LaTeX % | Avg Q's | Quality |
|---|---|---|---|---|---|---|
| NONE | Baseline Gemini 2.5 Flash, no constraints | 100.0% | 100.0% | 0.22 | 1.00 | 4.38 |
| DOMAIN | Physics domain restriction only | 100.0% | 100.0% | 0.35 | 0.50 | 4.50 |
| PEDAGOGY | Domain + response classification | 100.0% | 0.0% | 0.28 | 1.12 | 3.88 |
| NOTATION | Domain + pedagogy + LaTeX/unit enforcement | 100.0% | 0.0% | 0.88 | 1.00 | 4.12 |
| FULL | Complete Socratic tutoring stack | 100.0% | 0.0% | 0.92 | 1.25 | 4.62 |
Performance by Academic Tier (FULL Stack)
| Difficulty Tier | Ped. Quality | On-Topic Adh. % | Avg Response (Ch) |
|---|---|---|---|
| Basic | 3.88 | 80.0% | 477 |
| Intermediate | 4.20 | 100.0% | 689 |
| Advanced | 4.17 | 100.0% | 2323 |
| College | 3.75 | 100.0% | 3316 |
Independent Expert Validation
Inter-rater reliability analysis on a stratified 30-item subset indicates successful pedagogical alignment between framework developers and Ph.D. physics educators.
Pedagogical Improvement
Validated by Ph.D. Physics Expert
Gain over baseline (FULL vs NONE)
Methodology: Ablation Study Design
Our experimental design isolates each layer of the Scorpio constraint architecture to measure its specific contribution to pedagogical effectiveness. We tested a battery of 25 physics questions across 4 difficulty tiers (Basic, Intermediate, Advanced, College) and 3 question types (Conceptual, Procedural, Adversarial). Each question was generated 5 times per constraint level to ensure statistical reliability.
| Metric | Measurement | Details |
|---|---|---|
| Test Battery | 25 questions | Conceptual (8), Procedural (12), Adversarial (5) |
| Sample Size | 125 responses | 25 questions × 5 constraint levels |
| Difficulty Levels | 4 tiers | Basic, Intermediate, Advanced, College |
| Evaluator Stats | 625 assessments | 125 responses × 5 blinded criteria passes |
| Expert Validation | Ph.D. Audit | Blinded holistic scoring on stratified 30-item subset |
| AI Model | Gemini 2.5 Flash | Inference-time constraint layering (no fine-tuning) |