Trained on — real trades. The strategy wins
— of the time and has made
— — but the model only scores
— accuracy, barely above the
— you'd get by guessing "win" every time.
— test accuracy (vs — base rate)— AUC (0.5 = coin flip at picking winners from losers)—not a live signal — model has never traded a dollar
How to read this dashboard
This is a real machine-learning model trained on 550 real options trades — my own actual
theta-selling record, pulled straight from Schwab Gain/Loss. The strategy is genuinely
profitable: it wins about 88.7% of the time. But here is the honest catch, and it's the whole
point of this page: because the strategy wins so often, a model that blindly guesses "win" every
single time already scores ~88.7%. My model's 89.1% test accuracy barely clears that bar.
So read these two numbers side by side (N=…): real data, a real
profitable strategy — but a model that hasn't yet proven it can tell winners from losers any better than
the base rate. The number that actually matters is AUC (0.654), and it says the model is only
modestly good at spotting the rare, large losses. Nothing here is dressed up; where the model
can't honestly show something, you'll see a labelled ghost placeholder, not a fabricated chart.
Bullish weight / profitBearish weight / loss / unreliableEarly / experimentalDead / no signalData leakage
How to read these numbers honestly
The trap is the base rate. This strategy wins ~88.7% of the time, so a do-nothing model
that just predicts "win" on every trade is already right ~88.7% of the time. My model's
89.1% test accuracy sits almost exactly on top of that line — accuracy here is
not evidence of skill. The honest yardstick is AUC = 0.654: a 0.5 would mean it
can't tell winning trades from losing ones at all, and 1.0 would be perfect. At 0.654 the model is
modestly better than chance at the one job that matters — flagging the rare, large losses
before they happen — and its 95% interval still brushes the coin-flip line. Real data, real profitable strategy, model not yet proven. Treat it as a lab
notebook, not a signal.
Realised P&L — the real track record
Cumulative realised P&L (every closed trade, ordered by entry)
Per-trade P&L distribution — the fat left tail
Data volume
Total trades
—
— / 50
Sample size is not the bottleneck anymore.
Train / Test split
—
440 train / 110 test — plenty of rows per feature, so train accuracy is meaningful, not memorized.
Is it overfitting?
No. With 440 training rows and only 5 live features, the model has far more data than parameters — the opposite of the tiny-sample regime where logistic regression just memorizes. Train accuracy (88.9%) and test accuracy (89.1%) sit right on top of each other, which is what a non-overfit fit looks like.
Train accuracy
—
N_train = — — within a point of test accuracy, so no memorization
Test accuracy
—
Wilson 95% CI (N=—)
——
Compare to the ~88.7% always-win base rate — the margin is tiny.
One or more top-weighted features are post-trade labels — values I only know after a trade
closes (e.g. exit_reason=BTC_LOSS). Including them encodes the outcome into the input. The model
isn't learning to predict good trades; it's memorizing how trades ended. These must be removed before retraining.
Feature importance
Signed weights — red = bearish, green = bullish, hatched = leakageExplain
Bias term: —. Bars are scaled to the largest-magnitude weight.
Logistic weights on standardized features — direction and relative pull, not causation.
Dead features — zero weight OR std≈0 (no signal)
—
What "dead" means
Dead = weight <1e-6 OR std ≤1e-7 (constant across all training rows). These contributed nothing because the Schwab Gain/Loss export doesn't include market context — VIX, IV-rank, deltas and stock prices are all blank. They'll come alive once I join real market feeds to the fills.
Baselines — what is the model actually beating?
Accuracy vs trivial strategies
Win rate by segment
By option right — call vs put
By days-to-expiry bucket
Shadow mode — model vs me
PnL comparison — shadow period
Model refused all trades — N too small to conclude anything.
Agree / partial / disagree
—
full agree
—
partial
—
disagree
The model has been in SHADOW mode — no live orders placed.
The bar this model has to clear — set in advance, not after the fact
The real target isn't sample size (already past it) — it's out-of-sample AUC. Progress toward the 0.70 line:
AUC — / 0.70 target
What would flip this from "not a signal" to a signal — each bar checked live against metrics.json:
If out-of-sample AUC can't clear 0.70 once the dead market-context features are wired in and N passes ~750, the architecture is wrong and NEO-LR gets retired — not re-tuned into looking good.—
If the gating policy can't beat the always-trade net P&L after fees on a walk-forward split, the model adds no value over doing nothing and gets shelved.—
If calibration stays this far off the diagonal (it just predicts ~89% on everything), the scores aren't decision-grade and won't size a single real dollar.—
Model changelog
Every retrain, appended — date · N · AUC · accuracy
Date (UTC)
N
AUC
Accuracy
Note
Loading…
The road to a real model
Next milestone strip
From a simple base-rate model to one with real, measurable skill.
Done
550 real trades ingested
Schwab Gain/Loss history loaded and leakage-checked. Past the 50-trade minimum.
Now
Beat the base rate, not just match it
Accuracy ≈ the 88.7% always-win baseline. The real job: push AUC (now 0.654) above 0.70 out-of-sample.
Next
Wake the dead inputs
Join real VIX / IV-rank / delta feeds to the fills so the model has market context to learn from.
Goal
Out-of-sample, live Schwab
Walk-forward validation on future fills, calibration on the diagonal, AUC that holds up.