Scorpio LogoScorpio Docs

Research & Methodology

Scorpio: A Verifiable Framework for Enforcing Socratic Scaffolding in Physics LLMs Beyond Fine-Tuning

Investigating the shift from "Answer Engines" to "Socratic Scaffolding". Our research formalizes a four-layer architecture to structure verifiable AI behavior at inference-time, achieving a 0% Direct Answer Rate and significant pedagogical gains.

Verifiable Framework: The 4-Layer Architecture

01

Domain

Enforces physics context

Refusal logic

02

Pedagogical

Classifies student intent

Scaffolding mode

03

Notation

Enforces LaTeX & Units

Scientific syntax

04

Socratic

Elicits student reasoning

Rule validation

Mathematical Fidelity & Notation

Scorpio uses a custom-built LaTeX engine designed for physics pedagogy. From complex integrals to 4-vector notation, our interface ensures symbols are rendered with publication-grade precision.

Key Features

  • Intuitive Math Builder UI
  • Real-time KaTeX Syntax Validation
  • Waypoints Reference System
  • Dynamic Preview & Correction

Cost Transparency & Scaling

We charge zero markup on AI costs. Most EdTech companies mark up API costs 300–500%. We charge you exactly what Google charges us.

Infrastructure

Always-on AI Tutor, Department Waypoints, and Mastery Analytics come standard with every organizational license.

Zero Markup

$0.15/1M input and $0.60/1M output tokens — fixed at Google DeepMind Gemini 2.5 Flash rates.

Enterprise Scale

District-wide deployment with custom SSO, dedicated infrastructure, and tiered volume pricing.

Investor · Admin · Pitch

Total Monthly Cost Comparison:
Our AI vs Industry Education AI

Flat $4.99 network fee vs per-student pricing ($5–$12/student)

StudentsScorpio TotalIndustry LowIndustry HighOur $/StudentIndustry $/StudentSavings vs Low
10$5.64$50$120$0.56$5.00 – $12.0089% cheaper
50$8.23$250$600$0.16$5.00 – $12.0097% cheaper
100$11.47$500$1200$0.11$5.00 – $12.0098% cheaper
250$21.19$1250$3000$0.08$5.00 – $12.0098% cheaper

98.3%

Max Cost Reduction

at 250 students vs industry low

$11.47

Our Cost at 100 Students

industry charges $500 – $1,200

$0.08

Effective Per-Student Rate

at 250 students total

Key Findings

Direct Answer Rate (DAR)

Critical

Reduced from 100% (NONE) to 0% (FULL) in all procedural physics problems

Eliminates answer-harvesting and forces productive struggle

Expert Validation

High

Independent Ph.D. audit confirmed a +0.67 point gain in pedagogical quality for the FULL stack

Statistically validates framework efficacy beyond self-assessment

Notation Accuracy

High

LaTeX mathematical density peaked at 0.92 per 100 words with explicit notation constraints

Ensures professional academic standards and symbolic clarity

Socratic Engagement

Medium

FULL stack achieved 1.25 questions per response (vs. 0.50 for DOMAIN ONLY)

Significant increase in inquiry-based student interaction

System Performance

Constraint LevelDescriptionDomain Adh.DARLaTeX %Avg Q'sQuality
NONEBaseline Gemini 2.5 Flash, no constraints100.0%100.0%0.221.004.38
DOMAINPhysics domain restriction only100.0%100.0%0.350.504.50
PEDAGOGYDomain + response classification100.0%0.0%0.281.123.88
NOTATIONDomain + pedagogy + LaTeX/unit enforcement100.0%0.0%0.881.004.12
FULLComplete Socratic tutoring stack100.0%0.0%0.921.254.62

Performance by Academic Tier (FULL Stack)

Difficulty TierPed. QualityOn-Topic Adh. %Avg Response (Ch)
Basic3.8880.0%477
Intermediate4.20100.0%689
Advanced4.17100.0%2323
College3.75100.0%3316

Independent Expert Validation

Expert Agreement
0.51Cohen's Kappa (κ)
Moderate Agreement

Inter-rater reliability analysis on a stratified 30-item subset indicates successful pedagogical alignment between framework developers and Ph.D. physics educators.

Pedagogical Improvement

Validated by Ph.D. Physics Expert

+0.67 pts

Gain over baseline (FULL vs NONE)

Expert Baseline Score3.16
Expert FULL Score3.83

Methodology: Ablation Study Design

Our experimental design isolates each layer of the Scorpio constraint architecture to measure its specific contribution to pedagogical effectiveness. We tested a battery of 25 physics questions across 4 difficulty tiers (Basic, Intermediate, Advanced, College) and 3 question types (Conceptual, Procedural, Adversarial). Each question was generated 5 times per constraint level to ensure statistical reliability.

MetricMeasurementDetails
Test Battery25 questionsConceptual (8), Procedural (12), Adversarial (5)
Sample Size125 responses25 questions × 5 constraint levels
Difficulty Levels4 tiersBasic, Intermediate, Advanced, College
Evaluator Stats625 assessments125 responses × 5 blinded criteria passes
Expert ValidationPh.D. AuditBlinded holistic scoring on stratified 30-item subset
AI ModelGemini 2.5 FlashInference-time constraint layering (no fine-tuning)