Last updated: May 2026
A detailed look at what feeds the model, how it's trained, and how we know whether it's working. No black boxes — every input and every accuracy number on this site is reproducible from the data in our public database.
Each prediction starts as seven numbers about tonight's matchup. The same seven numbers are computed at training, validation, and serving time — there is no hidden feature drift.
ELO is a single number per team that updates after every game — winning teams gain points, losing teams lose them, and the size of the swing depends on the margin of victory and the strength of the opponent. We use FiveThirtyEight's published NBA ELO parameters (K=20, +100 home-court advantage, 75/25 inter-season regression toward 1505).
In our backtest this turned out to be the model's most important feature.
For each team we compute the average margin of victory across their last 20, 10, and 5 completed games. Three different windows give the model different views — the 20-game view is statistically stable, the 5-game view reacts quickly to form changes, and the 10-game view splits the difference.
Why raw point differential rather than a fancier metric? Because it's grounded in games that actually happened, captured directly from final scores, with no API dependency or version-drift risk.
The difference between how many days each team has had since their last game. Positive when the home team is fresher. Rest is well-documented to matter — teams shoot worse and turn the ball over more on zero days of rest.
Two binary inputs — was the home team on the second leg of a back-to-back? Was the visitor? These overlap with rest days but the model gets to learn that B2B specifically is a step-change effect rather than a smooth one.
In the trained model, the visitor B2B flag matters more than the home B2B flag — visitors compound travel fatigue with back-to-back fatigue.
The classifier is XGBoost — gradient-boosted decision trees. It's well-suited to tabular numeric data with a handful of inputs, robust against feature scale mismatches, and outputs a probability.
We use three published-best-practice training tricks:
features.build_feature_vector) so the
seven inputs are defined identically — no possibility of a silent train/serve gap.python src/train.py, and reproduce
the model exactly.Raw XGBoost probabilities don't have to mean what they say. A model might output 70% on games that actually win 60% of the time — overconfident — or 60% on games that actually win 70% — underconfident. Isotonic regression fixes this by learning a monotonic mapping from "raw model output" to "calibrated probability" using games the model never saw during training.
The result, measured on our historical backtest of 7,146 games:
| Model says | Actually wins |
|---|---|
| 54% | 52% |
| 65% | 63% |
| 74% | 72% |
| 85% | 86% |
| 100% | 95% |
Live numbers and the full reliability table are on the Track Record page and refresh daily.
A "LOCK" isn't the model's most confident pick in isolation. It's a comparison between our model's implied point spread and the Vegas closing spread. When the gap is three points or more, we flag the game. The direction is set by the sign of the gap:
This isn't betting advice; it's a flag that the model has reasonable disagreement with the market. Vegas wins most disagreements, because Vegas is excellent.
About a third of the time, straight up — overall accuracy of 67.1% means we miss roughly 32.9% of games. On games where the model is least confident (50-55%), it's essentially a coin flip and shouldn't be treated as a signal. On games where the model is most confident (predicted >65%), it hits about 75% — still wrong about one in five times.
Streaks happen. Any string of losses lasting fewer than ~10 games is consistent with the model working as designed and just running cold for a stretch.
We're transparent about this because honest limitations are part of an honest model. The accuracy gap between our 67.1% and Vegas's roughly 68% is almost entirely down to information we can't see.
The model gets better when we add features that aren't already captured indirectly. On our short list:
Everything described here is in our public GitHub repository:
Have a feature you think we should add? Email ben.g.ballard@gmail.com.