TRAINED ON 545 REAL TRADES  ·  logistic regression  ·  NOT A LIVE TRADING SIGNAL
Methodology

How the model works

Plain-English walk-through of every moving part: the algorithm, the trading strategy it watches, how I validate it honestly, and why an 88.6% win rate makes raw accuracy a misleading score — so the real test of skill is AUC. No jargon left unexplained.

Logistic regression is one of the simplest "learning" models there is. You feed it a list of numbers describing a trade (the features): hour of day, days-to-expiry, whether it's a call or a put, and so on. It multiplies each feature by a learned weight, adds them up with a bias term, then squashes the total through an S-shaped curve (the sigmoid) to produce a probability between 0 and 1.

z = bias + (w·feature) + (w·feature) + … + (w₁₄·feature₁₄)
p(win) = sigmoid(z) = 1 / (1 + e−z)

A positive weight means "more of this feature → more likely a win"; a negative weight means the opposite. Training just nudges the weights, over and over, to make the predicted probabilities line up with what actually happened across 545 real closed trades. That's the whole trick — it finds linear correlations. It cannot learn complex interactions, which is one reason its skill is limited (see the base-rate problem below).

WHY THIS MODEL, FOR NOW
Logistic regression is interpretable — you can read every weight and see what the model "thinks" matters. That transparency is the point: before reaching for anything fancier, I want to know exactly how hard the problem is and what a simple model can and can't do. Once market-feed features are wired in, a more flexible model can be tried and compared honestly.

The human trader being shadowed sells options for income — a strategy often called theta selling. Here's the plain version:

So the modeling question is: given the conditions at the moment a trade is opened, will it close as a win? The features try to capture those conditions — call vs put, days-to-expiry, premium size, position size, day of week — and the label is simply win or loss by realized P&L.

Here is the trap that makes this whole project worth doing honestly. Theta selling wins often and loses big: across our 545 real trades it won 483 and lost only 62 — a 88.6% win rate.

Now imagine the laziest possible "model": it ignores every feature and just predicts win on every single trade. How accurate is it? 88.6% — because that's how often the strategy wins. It did zero work and scored almost 90%.

WHY ACCURACY LIES HERE
My model's test accuracy is 0.890. Against the 88.6% "always guess win" baseline, that's a rounding error. A high accuracy number on an imbalanced dataset like this proves almost nothing — you have to ask whether the model beats the base rate, and ours barely does.

So what is the right test? AUC — the area under the ROC curve. It ignores the imbalance and measures the only thing that's actually hard and actually valuable here: can the model rank the rare losses below the many wins? An AUC of 0.5 is a coin flip; 1.0 is perfect. Ours is 0.688.

THE HONEST SUMMARY
Real data. A genuinely profitable strategy (+$81,955.09 realized). But a model whose only demonstrated skill is a modest AUC of 0.688 — better than chance, far from proven. The strategy is the edge; the model has not yet been shown to add one.

With trades, you cannot shuffle the data and split it randomly the way you would with photos. Trades are ordered in time, and using a future trade to predict a past one is cheating — you'd never have that information live. The honest way to evaluate a trading model is walk-forward:

  1. Train on the earliest chunk of trades.
  2. Predict the next chunk (which the model has never seen, and which all happened after training).
  3. Slide the window forward, retrain, predict the next chunk, and repeat.
  4. Stitch together every out-of-sample prediction — that's your realistic estimate of live performance.

I run this for real: a 5-fold, time-ordered, expanding-window walk-forward across all 545 trades. The mean out-of-sample accuracy is 0.8840.032). Notice that this lands right on top of the 88.6% base rate — consistent with the base-rate problem above: the model isn't separating itself from "always guess win" by much.

A single accuracy number is misleading without a sense of how much it could wobble. A confidence interval answers the real question: given what I observed, what's the plausible range for the true value? The Wilson interval is a well-behaved way to compute that for proportions — it doesn't break or give impossible <0% / >100% bounds the way the naive formula does.

Test accuracy, with its real Wilson 95% CI — and the base-rate line
Test accuracy 0.890 (109 held-out trades) — Wilson 95% CI [0.817, 0.936]
0.8170.936

The orange line is the 88.6% "always guess win" base rate. The model's accuracy CI sits right around it — the interval includes the base rate, which means I can't even rule out that the model is no better than guessing on accuracy alone. That's exactly why AUC, not accuracy, is the metric to watch.

Every headline number on the dashboard ships with its Wilson CI so the uncertainty stays visible. With 109 real held-out trades the intervals are reasonably tight — the limitation here isn't sample size, it's that the model simply hasn't shown much separation from the base rate yet.

Data leakage is when information that wouldn't be available at prediction time sneaks into the training features. The model then looks brilliant in testing and useless in real life, because it learned to read the answer instead of predicting it. A Schwab Gain/Loss export is full of leaky columns — they describe how each trade ended.

OUR LEAKAGE GUARD
Before training, the pipeline drops every post-trade column — win, pnl, pnl_pct, exit_reason, exit_time, hold_hours, ml_score — because all of them are only known after a trade closes. A built-in leakage check then re-verifies that none of them survived into the model. metrics.json reports zero leakage features in NEO-LR-v0.1. Every input the model sees is information you'd actually have at the moment you open the trade.

Classic leakage tells: a single feature dominating, suspiciously perfect accuracy, and metrics that fall apart on fresh out-of-sample data. Our modest AUC of 0.688 is, ironically, reassuring here — it's the honest score of a model that isn't peeking at the answer.

The data is real and the strategy is profitable. What's missing is proof the model adds skill. The next moves are about beating the base rate, not collecting more rows:

  1. Revive the dead features. 9 of 14 inputs (vix, iv_rank, delta, intraday returns, …) are empty because no market-feed data is wired in. These are exactly the signals most likely to flag a losing trade — adding them is the single biggest lever on AUC.
  2. Retrain and re-measure AUC. The question is whether AUC climbs meaningfully above today's 0.688. Accuracy will stay near the base rate either way and tells us little.
  3. Walk-forward, always. Keep evaluating with time-ordered, sliding-window validation — never random shuffles.
  4. Report with intervals. Every headline metric ships with its Wilson CI, so uncertainty stays visible.
  5. Compare honestly. Only if out-of-sample AUC clears the base-rate floor with a CI that stays above it would I consider whether the model adds anything over the human trader. Until then it places no orders and is not a live trading signal.
← Back to the ML dashboard