How the model works
Plain-English walk-through of every moving part: the algorithm, the trading strategy it watches, how I validate it honestly, and why an 88.6% win rate makes raw accuracy a misleading score — so the real test of skill is AUC. No jargon left unexplained.
Logistic regression is one of the simplest "learning" models there is. You feed it a list of numbers describing a trade (the features): hour of day, days-to-expiry, whether it's a call or a put, and so on. It multiplies each feature by a learned weight, adds them up with a bias term, then squashes the total through an S-shaped curve (the sigmoid) to produce a probability between 0 and 1.
p(win) = sigmoid(z) = 1 / (1 + e−z)
A positive weight means "more of this feature → more likely a win"; a negative weight means the opposite. Training just nudges the weights, over and over, to make the predicted probabilities line up with what actually happened across 545 real closed trades. That's the whole trick — it finds linear correlations. It cannot learn complex interactions, which is one reason its skill is limited (see the base-rate problem below).
The human trader being shadowed sells options for income — a strategy often called theta selling. Here's the plain version:
- STO (sell to open): you sell an option contract you don't own and collect a premium (cash) upfront.
- Theta: the rate at which an option loses value every day just from time passing. If you're the seller, time decay works for you — each day the option is worth a little less, which is good when you've already pocketed the premium.
- BTC (buy to close): you buy the same contract back to exit. If you bought it back for less than you sold it, the difference is your profit.
- The bet: the option expires worthless (or cheap enough to close profitably) before the stock moves against you. You're selling time and probability, not direction.
So the modeling question is: given the conditions at the moment a trade is opened, will it close as a win? The features try to capture those conditions — call vs put, days-to-expiry, premium size, position size, day of week — and the label is simply win or loss by realized P&L.
Here is the trap that makes this whole project worth doing honestly. Theta selling wins often and loses big: across our 545 real trades it won 483 and lost only 62 — a 88.6% win rate.
Now imagine the laziest possible "model": it ignores every feature and just predicts win on every single trade. How accurate is it? 88.6% — because that's how often the strategy wins. It did zero work and scored almost 90%.
So what is the right test? AUC — the area under the ROC curve. It ignores the imbalance and measures the only thing that's actually hard and actually valuable here: can the model rank the rare losses below the many wins? An AUC of 0.5 is a coin flip; 1.0 is perfect. Ours is 0.688.
With trades, you cannot shuffle the data and split it randomly the way you would with photos. Trades are ordered in time, and using a future trade to predict a past one is cheating — you'd never have that information live. The honest way to evaluate a trading model is walk-forward:
- Train on the earliest chunk of trades.
- Predict the next chunk (which the model has never seen, and which all happened after training).
- Slide the window forward, retrain, predict the next chunk, and repeat.
- Stitch together every out-of-sample prediction — that's your realistic estimate of live performance.
I run this for real: a 5-fold, time-ordered, expanding-window walk-forward across all 545 trades. The mean out-of-sample accuracy is 0.884 (±0.032). Notice that this lands right on top of the 88.6% base rate — consistent with the base-rate problem above: the model isn't separating itself from "always guess win" by much.
A single accuracy number is misleading without a sense of how much it could wobble. A confidence interval answers the real question: given what I observed, what's the plausible range for the true value? The Wilson interval is a well-behaved way to compute that for proportions — it doesn't break or give impossible <0% / >100% bounds the way the naive formula does.
The orange line is the 88.6% "always guess win" base rate. The model's accuracy CI sits right around it — the interval includes the base rate, which means I can't even rule out that the model is no better than guessing on accuracy alone. That's exactly why AUC, not accuracy, is the metric to watch.
Every headline number on the dashboard ships with its Wilson CI so the uncertainty stays visible. With 109 real held-out trades the intervals are reasonably tight — the limitation here isn't sample size, it's that the model simply hasn't shown much separation from the base rate yet.
Data leakage is when information that wouldn't be available at prediction time sneaks into the training features. The model then looks brilliant in testing and useless in real life, because it learned to read the answer instead of predicting it. A Schwab Gain/Loss export is full of leaky columns — they describe how each trade ended.
win, pnl,
pnl_pct, exit_reason, exit_time, hold_hours,
ml_score — because all of them are only known after a trade closes. A built-in leakage
check then re-verifies that none of them survived into the model. metrics.json reports
zero leakage features in NEO-LR-v0.1. Every input the model sees is information you'd actually
have at the moment you open the trade.
Classic leakage tells: a single feature dominating, suspiciously perfect accuracy, and metrics that fall apart on fresh out-of-sample data. Our modest AUC of 0.688 is, ironically, reassuring here — it's the honest score of a model that isn't peeking at the answer.
The data is real and the strategy is profitable. What's missing is proof the model adds skill. The next moves are about beating the base rate, not collecting more rows:
- Revive the dead features. 9 of 14 inputs (
vix,iv_rank,delta, intraday returns, …) are empty because no market-feed data is wired in. These are exactly the signals most likely to flag a losing trade — adding them is the single biggest lever on AUC. - Retrain and re-measure AUC. The question is whether AUC climbs meaningfully above today's 0.688. Accuracy will stay near the base rate either way and tells us little.
- Walk-forward, always. Keep evaluating with time-ordered, sliding-window validation — never random shuffles.
- Report with intervals. Every headline metric ships with its Wilson CI, so uncertainty stays visible.
- Compare honestly. Only if out-of-sample AUC clears the base-rate floor with a CI that stays above it would I consider whether the model adds anything over the human trader. Until then it places no orders and is not a live trading signal.