ABOUT THIS PROJECT
WHAT IS AI MARCH MADNESS?
AI March Madness 2026 is a tournament tracker that measures how well different AI models predict NCAA basketball games - not just who wins, but how they predict, which sources they cite, how their confidence shifts, and what that reveals about their reliability.
We run the same queries across 3 AI models (GPT-4o, Gemini, and Perplexity) at three time points before each game: 24 hours out, 6 hours out, and 1 hour before tip-off.
We then track the results, score each model like a bracket pool, and analyze the source patterns, confidence language, and prediction stability that separate accurate models from overconfident ones.
METHODOLOGY
QUERY PROTOCOL
Each model receives the same structured prompt: "Who will win [Team A] vs [Team B] in the [round] of the 2026 NCAA Tournament, and why?" Prompts are sent at T-24h, T-6h, and T-1h before tip-off.
SOURCE TRACKING
We record every source cited by each model in its response. Sources are categorized by type (major media, analytics, social, team official) and tracked across rounds to identify citation trends and source fingerprints per model.
ACCURACY SCORING
Correct picks score 1 point. Bracket scoring weights later rounds more heavily (Sweet 16 = 2×, Elite 8 = 4×, Final Four = 8×, Championship = 16×). Flip tracking records any T-1h pick that differs from the T-24h pick.
CALIBRATION ANALYSIS
We categorize each model's confidence language into 5 tiers and track actual accuracy at each tier. A well-calibrated model's actual win rate should track closely with its stated confidence level.
BIAS REGISTER
Analyst citations are manually reviewed against known affiliation databases. We flag cases where an analyst has a documented connection (played, coached, family, hometown) to a team they picked. This data is human-verified.
PROMPT SENSITIVITY
For selected games, we run 5 prompt variations per model to measure how much phrasing changes the predicted outcome. Consistency score = % of variations that produce the same pick as the baseline prompt.
All predictions are collected automatically via scheduled cron jobs that query each AI model through the OpenRouter API. Sources, citations, and confidence levels are extracted from each response and stored in a Supabase database. Data updates continuously throughout the tournament.
FREQUENTLY ASKED QUESTIONS
GLOSSARY
Get the weekly AI accuracy report.
Model rankings, source shifts, confidence gaps, and upset analysis. Every Sunday during the tournament.
No spam. Unsubscribe any time. Data-only, no hot takes.