Confidence Calibration Curves - AI March Madness 2026
Calibration curves plot each AI model's stated confidence against its actual accuracy across all 2026 NCAA Tournament predictions. A perfectly calibrated model traces the diagonal: when it says 80% confident, it should be correct 80% of the time.
Predictions are grouped into five confidence buckets (55%, 65%, 75%, 85%, 95%) and the actual win rate is computed for each bucket. Most AI models demonstrate overconfidence, stating high confidence on predictions they get wrong more often than their stated probability implies.
Calibration scores measure the average distance from the perfect calibration line. Lower scores indicate better calibration. GPT-4o, Gemini 2.5, and Perplexity Sonar Pro are compared across all confidence levels as the tournament progresses from the First Four through the Championship.
- Calibration Score
- A numeric measure of how well stated confidence matches actual accuracy. Perfect calibration equals zero.
- Confidence Bucket
- A grouping of predictions by stated confidence level (55%, 65%, 75%, 85%, 95%) used to compute calibration curves.
- Overconfidence
- When a model's stated confidence consistently exceeds its actual accuracy rate at a given confidence level.