The Gauntlet
Forven's robustness battery — walk-forward, Monte-Carlo, parameter jitter, cost-stress and regime-split — that kills fragile strategies before they reach paper.
The gauntlet is the robustness battery a strategy must survive after it clears the cheap triage gate and before it is allowed to trade — even in simulation. It is one stage of the pipeline: researching → backtesting → quick_screen → gauntlet → paper → live → retired. The public site sometimes calls the whole funnel "the gauntlet"; in the app, gauntlet is the specific stage that proves an edge is real rather than curve-fitted.
A strategy only enters the gauntlet once quick_screen (the public "screen") has confirmed it has distinct out-of-sample evidence and is not an obvious overfit. The gauntlet then runs five validation tests, ordered cheap-first so weak strategies fail fast. Passing earns a strategy a place in paper trading (the public site calls this stage a "candidate"). It does not earn it real capital — that is a separate, much stricter gate.
Forven is a research tool. The gauntlet measures how a strategy behaved on historical and simulated data; it does not predict the future, and nothing here is financial advice. Survival is evidence of discipline, not a promise of profit.
What the gauntlet tests
The battery runs as an async workflow with ordered steps. Before the robustness tests proper, the gauntlet prepares the strategy:
- quick_screen gate — re-checks the overfitting guardrails at entry.
- timeframe sweep — backtests across multiple timeframes (default
1h,4h,1d). - optimization — a parameter search over the strategy's space.
- apply optimized defaults — writes the best params into the strategy.
- confirmation backtest — re-validates those params on fresh data.
Then the five robustness tests run, in v2 cheap-first order:
| Test | What it proves | Pass criteria (default) |
|---|---|---|
| Walk-forward analysis (WFA) | Edge holds on unseen future data | IS→OOS Sharpe degradation <= 35%; >= 40% of folds profitable; min folds 2 |
| Cost-stress | Edge survives realistic trading costs | Stressed Sharpe degradation <= 60%; live gate also wants stressed Sharpe >= 0.3 |
| Monte-Carlo bootstrap | Tail risk is bounded | >= 65% of resamples profitable; 95th-percentile drawdown <= 40% |
| Regime-split | Edge is not regime-specific luck | Profitable in >= 50% of market regimes |
| Parameter jitter | Params are stable, not a knife-edge | >= 60% of ±5–10% reruns pass |
By default, the required tests are walk_forward, param_jitter, and cost_stress (gauntlet.required_tests). An empty list enforces all five. Walk-forward is mandatory: config that drops it self-heals on load, because without it the gauntlet has no honest out-of-sample signal.
Each test must persist a verdict — pass, fail, or blocked — before the paper-promotion gate runs.
Walk-forward analysis
Walk-forward is the spine of the battery. The data is split into folds (default 5), each trained on the first 70% (in-sample) and validated on the last 30% (out-of-sample). The strategy is judged on the OOS portion it never saw during tuning.
Folds with fewer than 5 OOS trades are excluded from both the numerator and denominator of the pass rate, so a thin-trading strategy can still pass if the few trades it makes are good. Walk-forward applies realistic costs: 4.5 bps fees and 2.0 bps slippage per trade by default.
Parameter jitter
Jitter perturbs the best parameters by ±5–10% and reruns the full backtest, roughly 50 times (capped at 30 iterations, max 4380 bars per rerun, with a 240-second wall-clock safety net). If most reruns survive, the params describe a real basin of edge rather than a single lucky coordinate. The pass floor is 60%.
Cost-stress
Cost-stress doubles fees and slippage across the full window and measures how much Sharpe decays. The edge must survive: degradation <= 60%, and for the strict live gate a stressed Sharpe >= 0.3. Many strategies that look profitable on paper-thin margins die here, which is the point — live trading is not free.
Monte-Carlo and regime-split
Monte-Carlo resamples the trade sequence to estimate tail risk; at least 65% of resamples must stay profitable and the 95th-percentile drawdown must stay at or under 40% (the paper gate clamps this to 50%). Regime-split partitions trades by market regime — trending up, trending down, range-bound, high-volatility — and requires the strategy to be profitable in at least half of them, so it isn't just a bull-market artifact.
The robustness score and verdicts
The gauntlet produces a robustness score on a 0–100 scale and per-test verdicts. The lean gauntlet→paper gate — what the promotion gates page calls the "achievable paper" gate — requires:
- robustness score
>= 50/100 - minimum total return
>= 0% - maximum drawdown
<= 30% - minimum OOS profit factor
>= 1.05 - minimum OOS Sharpe
>= 0(advisory)
These thresholds are deliberately reachable, even in adverse market conditions, so the funnel keeps moving while still filtering clear losers. Capital safety is enforced later, at the much stricter paper→live gate. That two-gate design — achievable paper, strict live — is intentional: paper costs nothing, so the bar is "not obviously broken"; live risks real money, so the bar is "prove forward edge."
A set of hardcoded floors (_PAPER_GATE_FLOORS) means config can only make these gates stricter, never looser. You cannot relax robustness below 50, MC drawdown above 40%, the WFA fold pass rate below 40%, jitter pass rate below 60%, or trade count below 30.
Running the gauntlet
You run the gauntlet from the strategy lab on a strategy that has already passed quick_screen.
Steps
- Open the lab and select a strategy sitting at the
quick_screenstage. - Confirm it has distinct out-of-sample evidence — without it, the gauntlet will refuse admission with the reason
no distinct OOS evidence. - Start the gauntlet (the lab queues the workflow, or the autonomous pipeline starts it for you).
- Watch the step progress tracker as each step claims, runs, and either completes or blocks.
- Read the verdict panel: each test shows
pass/fail/blocked, plus the composite robustness score. - If all required tests pass, the strategy advances to paper automatically — unless an operator approval is required, in which case it queues in the approvals panel.
If you prefer the API, the lab calls these endpoints under the hood:
# Check what's left before a manual promotion
curl.exe http://127.0.0.1:8003/api/strategies/$StrategyId/readiness
# Claim, complete, block, or retry individual gauntlet steps
curl.exe -X POST http://127.0.0.1:8003/api/gauntlet/steps/$StepId/retryWhat you'll see
In the lab's gauntlet panel, each step renders its status (pending, running, passed, blocked_*, failed_gate) and the validation artifacts it produced — WFA degradation, MC drawdown, jitter pass rate, and so on. The final paper-promotion gate shows either a clear pass or the exact gate-rejection reason, so you always know why a strategy stopped.
When steps block or fail
The gauntlet does not silently retry forever, and it does not loop:
- Gate failure (
gate_failure) — the strategy genuinely failed a test. No auto-retry. The strategy is demoted back toquick_screen. After three demotions it is redirected toresearch_onlyand removed from the tradable pipeline, so repeat failures stop burning compute. - Transient block (
blocked_data,blocked_runtime) — a data or runtime hiccup. Retries with exponential backoff (2/4/8/16/30+minutes) up to eight attempts, then terminal. - Slot contention (
gate_contention) — only when optional slot competition is on; the step retries on a 10-minute backoff until a slot frees. - Manual retry — you can always retry a blocked step yourself from the lab or via the retry endpoint.
A stale step stuck in running past gauntlet.async_result_max_age_minutes (default 60) is recovered automatically so a zombie worker can't jam the workflow.
Tuning the gauntlet
The thresholds live in config under the gauntlet.*, walk_forward.*, and robustness_thresholds.* key groups. A few of the most-touched knobs:
{
"gauntlet": {
"required_tests": ["walk_forward", "param_jitter", "cost_stress"],
"min_robustness_score": 50,
"max_drawdown_pct": 0.30,
"wfa_max_degradation": 0.35,
"mc_max_dd_p95": 0.40
},
"walk_forward": {
"n_folds": 5,
"in_sample_pct": 0.70,
"fee_bps": 4.5,
"slippage_bps": 2.0
},
"robustness_thresholds": {
"monte_carlo_percentile_min": 0.65,
"param_jitter_pass_rate_min": 0.60,
"cost_stress_max_degradation_pct": 60.0,
"regime_split_profitable_min": 0.50,
"wfa_fold_pass_rate_min": 0.40
}
}Drawdown and return values are stored as fractions (0.30 = 30%) and shown as percentages in the UI. See the configuration reference for precedence and the full key list. Remember the floors: tightening these values works; loosening below the hard minimums does not.
Honest caveats
The gauntlet is beta software, and a few rough edges are worth knowing:
- Walk-forward windowing can surface adequacy warnings on thin data (OOS windows under ~30 days at
1h, or sparse folds). Read them — they usually flag a data gap, not a crash. - Implausible metrics are rejected, not celebrated. A Sharpe
>= 5or profit factor>= 8on honest crypto data is treated as a look-ahead leak signature and blocked at bothquick_screenand gauntlet entry. If a strategy looks too good, the gauntlet assumes it cheated. testing_modebypasses thequick_screenand gauntlet gates for rapid iteration, but it never bypasses the paper or live capital gates. There is no real-money shortcut.- Passing the gauntlet earns paper, not live capital. The path to real money runs through the strict paper→live gate documented under promotion gates.
The gauntlet is built to make killing fragile strategies unavoidable. Most strategies fail it — and that is the design working, not failing.
Related
- The pipeline — the full lifecycle the gauntlet sits inside
- Promotion gates — the achievable-paper and strict paper→live gates in detail
- Metrics — every metric the gauntlet reads, and why to trust OOS
- The strategy lab — where you run the gauntlet
The pipeline
The strategy lifecycle in Forven: researching, backtesting, quick_screen, gauntlet, paper, live, retired — what each stage tests and the gate to advance.
Hypothesis-driven research
How Forven organizes research around market hypotheses that spawn strategies, judges them by hit-rate and diversity, and graduates only the proven.