NEO-LR-v0.1

Loading…

Type: —

Last trained: —

Age: —

Max iters: —

CURRENT HONEST VERDICT · —

Trained on — real trades. The strategy wins — of the time and has made — — but the model only scores — accuracy, barely above the — you'd get by guessing "win" every time.

— test accuracy (vs — base rate) — AUC (0.5 = coin flip at picking winners from losers) — not a live signal — model has never traded a dollar

How to read this dashboard

This is a real machine-learning model trained on 550 real options trades — my own actual theta-selling record, pulled straight from Schwab Gain/Loss. The strategy is genuinely profitable: it wins about 88.7% of the time. But here is the honest catch, and it's the whole point of this page: because the strategy wins so often, a model that blindly guesses "win" every single time already scores ~88.7%. My model's 89.1% test accuracy barely clears that bar.

So read these two numbers side by side (N=…): real data, a real profitable strategy — but a model that hasn't yet proven it can tell winners from losers any better than the base rate. The number that actually matters is AUC (0.654), and it says the model is only modestly good at spotting the rare, large losses. Nothing here is dressed up; where the model can't honestly show something, you'll see a labelled ghost placeholder, not a fabricated chart.

Bullish weight / profit Bearish weight / loss / unreliable Early / experimental Dead / no signal Data leakage

How to read these numbers honestly

The trap is the base rate. This strategy wins ~88.7% of the time, so a do-nothing model that just predicts "win" on every trade is already right ~88.7% of the time. My model's 89.1% test accuracy sits almost exactly on top of that line — accuracy here is not evidence of skill. The honest yardstick is AUC = 0.654: a 0.5 would mean it can't tell winning trades from losing ones at all, and 1.0 would be perfect. At 0.654 the model is modestly better than chance at the one job that matters — flagging the rare, large losses before they happen — and its 95% interval still brushes the coin-flip line. Real data, real profitable strategy, model not yet proven. Treat it as a lab notebook, not a signal.

Realised P&L — the real track record

Cumulative realised P&L (every closed trade, ordered by entry)

Per-trade P&L distribution — the fat left tail

Data volume

Total trades

—

— / 50

Sample size is not the bottleneck anymore.

Train / Test split

—

440 train / 110 test — plenty of rows per feature, so train accuracy is meaningful, not memorized.

Is it overfitting?

No. With 440 training rows and only 5 live features, the model has far more data than parameters — the opposite of the tiny-sample regime where logistic regression just memorizes. Train accuracy (88.9%) and test accuracy (89.1%) sit right on top of each other, which is what a non-overfit fit looks like.

Train accuracy

—

N_train = — — within a point of test accuracy, so no memorization

Test accuracy

—

Wilson 95% CI (N=—)

— —

Compare to the ~88.7% always-win base rate — the margin is tiny.

Training freshness

—

trained

now

—today

Model fit indicators

Test AUC THE NUMBER THAT MATTERS

—

0.5 = no skill, 1.0 = perfect. At 0.654 the model is only modestly good at the thing accuracy can't measure: telling winners from losers.

Strategy win rate (real Schwab)

—

Wilson 95% CI (N=—)

— —

This is the strategy's edge — and the base rate the model must beat.

Cross-validation

Feature importance

Signed weights — red = bearish, green = bullish, hatched = leakage Explain

Bias term: —. Bars are scaled to the largest-magnitude weight. Logistic weights on standardized features — direction and relative pull, not causation.

Dead features — zero weight OR std≈0 (no signal)

—

What "dead" means

Dead = weight <1e-6 OR std ≤1e-7 (constant across all training rows). These contributed nothing because the Schwab Gain/Loss export doesn't include market context — VIX, IV-rank, deltas and stock prices are all blank. They'll come alive once I join real market feeds to the fills.

Baselines — what is the model actually beating?

Accuracy vs trivial strategies

Win rate by segment

By option right — call vs put

By days-to-expiry bucket

Filter:

Timestamp ▲▼	Trade ID	My action	Model action	Agreement ▲▼	My PnL ▲▼	Model PnL
Loading…

Calibration plot

Predicted probability vs actual outcome

Data quality, pipeline & tests

Data quality

Pipeline

Automated guards

Feature drift & honest backtest

Feature drift (PSI)

Backtest — gating policy vs always-trade

Pre-registered kill-criteria

The bar this model has to clear — set in advance, not after the fact

The real target isn't sample size (already past it) — it's out-of-sample AUC. Progress toward the 0.70 line:

AUC — / 0.70 target

What would flip this from "not a signal" to a signal — each bar checked live against metrics.json:

If out-of-sample AUC can't clear 0.70 once the dead market-context features are wired in and N passes ~750, the architecture is wrong and NEO-LR gets retired — not re-tuned into looking good.—
If the gating policy can't beat the always-trade net P&L after fees on a walk-forward split, the model adds no value over doing nothing and gets shelved.—
If calibration stays this far off the diagonal (it just predicts ~89% on everything), the scores aren't decision-grade and won't size a single real dollar.—

Model changelog

Every retrain, appended — date · N · AUC · accuracy

Date (UTC)	N	AUC	Accuracy	Note
Loading…

The road to a real model

Next milestone strip

From a simple base-rate model to one with real, measurable skill.

Done

550 real trades ingested

Schwab Gain/Loss history loaded and leakage-checked. Past the 50-trade minimum.

Now

Beat the base rate, not just match it

Accuracy ≈ the 88.7% always-win baseline. The real job: push AUC (now 0.654) above 0.70 out-of-sample.

Wake the dead inputs

Join real VIX / IV-rank / delta feeds to the fills so the model has market context to learn from.

Goal

Out-of-sample, live Schwab

Walk-forward validation on future fills, calibration on the diagonal, AUC that holds up.

Trained on — real trades. The strategy wins — of the time and has made — — but the model only scores — accuracy, barely above the — you'd get by guessing "win" every time.

How to read this dashboard

Realised P&L — the real track record

Data volume

Model fit indicators

Feature importance

Baselines — what is the model actually beating?

Win rate by segment

Shadow mode — model vs me

Per-trade shadow log

Calibration plot

Prediction scatter

Data quality, pipeline & tests

Feature drift & honest backtest

Pre-registered kill-criteria

Model changelog

The road to a real model

Trained on — real trades. The strategy wins — of the time and has made — — but the model only scores — accuracy, barely above the — you'd get by guessing "win" every time.

How to read this dashboard

Realised P&L — the real track record ¶

Data volume ¶

Model fit indicators ¶

Feature importance ¶

Baselines — what is the model actually beating? ¶

Win rate by segment ¶

Shadow mode — model vs me ¶

Per-trade shadow log ¶

Calibration plot ¶

Prediction scatter ¶

Data quality, pipeline & tests ¶

Feature drift & honest backtest ¶

Pre-registered kill-criteria ¶

Model changelog ¶

The road to a real model ¶

Realised P&L — the real track record

Data volume

Model fit indicators

Feature importance

Baselines — what is the model actually beating?

Win rate by segment

Shadow mode — model vs me

Per-trade shadow log

Calibration plot

Prediction scatter

Data quality, pipeline & tests

Feature drift & honest backtest

Pre-registered kill-criteria

Model changelog

The road to a real model