The Gauntlet

Forven's robustness battery — walk-forward, Monte-Carlo, parameter jitter, cost-stress and regime-split — that kills fragile strategies before they reach paper.

The gauntlet is the robustness battery a strategy must survive after it clears the cheap triage gate and before it is allowed to trade — even in simulation. It is one stage of the pipeline: researching → backtesting → quick_screen → gauntlet → paper → live → retired. The public site sometimes calls the whole funnel "the gauntlet"; in the app, gauntlet is the specific stage that proves an edge is real rather than curve-fitted.

A strategy only enters the gauntlet once quick_screen (the public "screen") has confirmed it has distinct out-of-sample evidence and is not an obvious overfit. The gauntlet then runs five validation tests, ordered cheap-first so weak strategies fail fast. Passing earns a strategy a place in paper trading (the public site calls this stage a "candidate"). It does not earn it real capital — that is a separate, much stricter gate.

Forven is a research tool. The gauntlet measures how a strategy behaved on historical and simulated data; it does not predict the future, and nothing here is financial advice. Survival is evidence of discipline, not a promise of profit.

What the gauntlet tests

The battery runs as an async workflow with ordered steps. Before the robustness tests proper, the gauntlet prepares the strategy:

  1. quick_screen gate — re-checks the overfitting guardrails at entry.
  2. timeframe sweep — backtests across multiple timeframes (default 1h, 4h, 1d).
  3. optimization — a parameter search over the strategy's space.
  4. apply optimized defaults — writes the best params into the strategy.
  5. confirmation backtest — re-validates those params on fresh data.

Then the five robustness tests run, in v2 cheap-first order:

TestWhat it provesPass criteria (default)
Walk-forward analysis (WFA)Edge holds on unseen future dataIS→OOS Sharpe degradation <= 35%; >= 40% of folds profitable; min folds 2
Cost-stressEdge survives realistic trading costsStressed Sharpe degradation <= 60%; live gate also wants stressed Sharpe >= 0.3
Monte-Carlo bootstrapTail risk is bounded>= 65% of resamples profitable; 95th-percentile drawdown <= 40%
Regime-splitEdge is not regime-specific luckProfitable in >= 50% of market regimes
Parameter jitterParams are stable, not a knife-edge>= 60% of ±5–10% reruns pass

By default, the required tests are walk_forward, param_jitter, and cost_stress (gauntlet.required_tests). An empty list enforces all five. Walk-forward is mandatory: config that drops it self-heals on load, because without it the gauntlet has no honest out-of-sample signal.

Each test must persist a verdict — pass, fail, or blocked — before the paper-promotion gate runs.

Walk-forward analysis

Walk-forward is the spine of the battery. The data is split into folds (default 5), each trained on the first 70% (in-sample) and validated on the last 30% (out-of-sample). The strategy is judged on the OOS portion it never saw during tuning.

Folds with fewer than 5 OOS trades are excluded from both the numerator and denominator of the pass rate, so a thin-trading strategy can still pass if the few trades it makes are good. Walk-forward applies realistic costs: 4.5 bps fees and 2.0 bps slippage per trade by default.

Parameter jitter

Jitter perturbs the best parameters by ±5–10% and reruns the full backtest, roughly 50 times (capped at 30 iterations, max 4380 bars per rerun, with a 240-second wall-clock safety net). If most reruns survive, the params describe a real basin of edge rather than a single lucky coordinate. The pass floor is 60%.

Cost-stress

Cost-stress doubles fees and slippage across the full window and measures how much Sharpe decays. The edge must survive: degradation <= 60%, and for the strict live gate a stressed Sharpe >= 0.3. Many strategies that look profitable on paper-thin margins die here, which is the point — live trading is not free.

Monte-Carlo and regime-split

Monte-Carlo resamples the trade sequence to estimate tail risk; at least 65% of resamples must stay profitable and the 95th-percentile drawdown must stay at or under 40% (the paper gate clamps this to 50%). Regime-split partitions trades by market regime — trending up, trending down, range-bound, high-volatility — and requires the strategy to be profitable in at least half of them, so it isn't just a bull-market artifact.

The robustness score and verdicts

The gauntlet produces a robustness score on a 0–100 scale and per-test verdicts. The lean gauntlet→paper gate — what the promotion gates page calls the "achievable paper" gate — requires:

  • robustness score >= 50/100
  • minimum total return >= 0%
  • maximum drawdown <= 30%
  • minimum OOS profit factor >= 1.05
  • minimum OOS Sharpe >= 0 (advisory)

These thresholds are deliberately reachable, even in adverse market conditions, so the funnel keeps moving while still filtering clear losers. Capital safety is enforced later, at the much stricter paper→live gate. That two-gate design — achievable paper, strict live — is intentional: paper costs nothing, so the bar is "not obviously broken"; live risks real money, so the bar is "prove forward edge."

A set of hardcoded floors (_PAPER_GATE_FLOORS) means config can only make these gates stricter, never looser. You cannot relax robustness below 50, MC drawdown above 40%, the WFA fold pass rate below 40%, jitter pass rate below 60%, or trade count below 30.

Running the gauntlet

You run the gauntlet from the strategy lab on a strategy that has already passed quick_screen.

Steps

  1. Open the lab and select a strategy sitting at the quick_screen stage.
  2. Confirm it has distinct out-of-sample evidence — without it, the gauntlet will refuse admission with the reason no distinct OOS evidence.
  3. Start the gauntlet (the lab queues the workflow, or the autonomous pipeline starts it for you).
  4. Watch the step progress tracker as each step claims, runs, and either completes or blocks.
  5. Read the verdict panel: each test shows pass / fail / blocked, plus the composite robustness score.
  6. If all required tests pass, the strategy advances to paper automatically — unless an operator approval is required, in which case it queues in the approvals panel.

If you prefer the API, the lab calls these endpoints under the hood:

# Check what's left before a manual promotion
curl.exe http://127.0.0.1:8003/api/strategies/$StrategyId/readiness

# Claim, complete, block, or retry individual gauntlet steps
curl.exe -X POST http://127.0.0.1:8003/api/gauntlet/steps/$StepId/retry

What you'll see

In the lab's gauntlet panel, each step renders its status (pending, running, passed, blocked_*, failed_gate) and the validation artifacts it produced — WFA degradation, MC drawdown, jitter pass rate, and so on. The final paper-promotion gate shows either a clear pass or the exact gate-rejection reason, so you always know why a strategy stopped.

When steps block or fail

The gauntlet does not silently retry forever, and it does not loop:

  • Gate failure (gate_failure) — the strategy genuinely failed a test. No auto-retry. The strategy is demoted back to quick_screen. After three demotions it is redirected to research_only and removed from the tradable pipeline, so repeat failures stop burning compute.
  • Transient block (blocked_data, blocked_runtime) — a data or runtime hiccup. Retries with exponential backoff (2/4/8/16/30+ minutes) up to eight attempts, then terminal.
  • Slot contention (gate_contention) — only when optional slot competition is on; the step retries on a 10-minute backoff until a slot frees.
  • Manual retry — you can always retry a blocked step yourself from the lab or via the retry endpoint.

A stale step stuck in running past gauntlet.async_result_max_age_minutes (default 60) is recovered automatically so a zombie worker can't jam the workflow.

Tuning the gauntlet

The thresholds live in config under the gauntlet.*, walk_forward.*, and robustness_thresholds.* key groups. A few of the most-touched knobs:

{
  "gauntlet": {
    "required_tests": ["walk_forward", "param_jitter", "cost_stress"],
    "min_robustness_score": 50,
    "max_drawdown_pct": 0.30,
    "wfa_max_degradation": 0.35,
    "mc_max_dd_p95": 0.40
  },
  "walk_forward": {
    "n_folds": 5,
    "in_sample_pct": 0.70,
    "fee_bps": 4.5,
    "slippage_bps": 2.0
  },
  "robustness_thresholds": {
    "monte_carlo_percentile_min": 0.65,
    "param_jitter_pass_rate_min": 0.60,
    "cost_stress_max_degradation_pct": 60.0,
    "regime_split_profitable_min": 0.50,
    "wfa_fold_pass_rate_min": 0.40
  }
}

Drawdown and return values are stored as fractions (0.30 = 30%) and shown as percentages in the UI. See the configuration reference for precedence and the full key list. Remember the floors: tightening these values works; loosening below the hard minimums does not.

Honest caveats

The gauntlet is beta software, and a few rough edges are worth knowing:

  • Walk-forward windowing can surface adequacy warnings on thin data (OOS windows under ~30 days at 1h, or sparse folds). Read them — they usually flag a data gap, not a crash.
  • Implausible metrics are rejected, not celebrated. A Sharpe >= 5 or profit factor >= 8 on honest crypto data is treated as a look-ahead leak signature and blocked at both quick_screen and gauntlet entry. If a strategy looks too good, the gauntlet assumes it cheated.
  • testing_mode bypasses the quick_screen and gauntlet gates for rapid iteration, but it never bypasses the paper or live capital gates. There is no real-money shortcut.
  • Passing the gauntlet earns paper, not live capital. The path to real money runs through the strict paper→live gate documented under promotion gates.

The gauntlet is built to make killing fragile strategies unavoidable. Most strategies fail it — and that is the design working, not failing.

  • The pipeline — the full lifecycle the gauntlet sits inside
  • Promotion gates — the achievable-paper and strict paper→live gates in detail
  • Metrics — every metric the gauntlet reads, and why to trust OOS
  • The strategy lab — where you run the gauntlet