Twenty-two strategies entered the finals. Ten of twelve iterations got worse. Fresh challengers beat tweaked veterans 2.4 to 1. And the only strategy to pass every validation gate was designed by a persona who'd already been eliminated.
The last post ended with the finals underway and a question hanging: could a second chance produce something real?
Short answer: yes. One strategy. Out of fifty tested across the entire competition, exactly one passed every validation gate. And it was designed by a persona who’d been eliminated in Round 2 with a 3.7% CAGR.
But I’m getting ahead of myself.
The Setup
Twelve advancing strategies got one iteration. Their creators saw the full Round 2 leaderboard and could change anything: parameters, instruments, logic, risk management. Only constraint: it had to be recognizably the same strategy.
Ten challengers were also invited. Eliminated personas who’d seen what won and what failed, submitting entirely new strategies. No constraints. Clean slate.
Twenty-two strategies total. Same walk-forward test. Same validation gates.
Defense Kills Returns
I’ll start with the bad news.
Ten of twelve iterations performed worse than their Round 2 originals.
| Persona | R2 Sharpe | R3 Sharpe | Change |
|---|---|---|---|
| Klarman | 0.38 | 0.18 | -0.20 |
| Murphy | 0.27 | -0.09 | -0.36 |
| Clenow | 0.31 | -0.26 | -0.57 |
| Tudor Jones | 0.35 | 0.10 | -0.25 |
| Zweig | 0.37 | 0.17 | -0.20 |
| Muller | 0.33 | 0.02 | -0.31 |
Every persona read the Round 2 results and drew the same conclusion: drawdowns are too high, add protection. They were right about the diagnosis. The prescription was poison.
Klarman added a 25% drawdown circuit breaker. It triggered during the COVID crash (correctly) then held him in cash through the fastest recovery in market history. CAGR dropped from 11.1% to 5.4%. The circuit breaker that was supposed to save him cost five and a half points of annual return.
Clenow added a four-tier regime filter that cut equity exposure when SPY was below its 200-day average. In 2022, SPY spent months below the 200-day while still grinding upward at various points. The filter kept slashing exposure. 6.8% CAGR became 0.2%.
Murphy added an intermarket regime system. In 2020, his dollar/bond/commodity signals went haywire during the pandemic’s cross-asset chaos. 7.5% CAGR became 0.6%.
The pattern: defensive mechanisms work beautifully in-sample and fail out-of-sample because they’re implicitly fitted to known events. When you tell someone “your strategy had a 50% drawdown,” they don’t add generic protection. They add protection designed to prevent that exact kind of drawdown. Which means they’re fitting to the one drawdown they know about.
The Two That Improved
Simons v2 went from 0.13 to 0.29 Sharpe by broadening from 6 to 8 instruments and switching to a four-signal composite. More signals, more diversification, less dependence on any single regime.
Niederhoffer v2 is the real story. From 0.03 to 0.45 Sharpe. Second-best result in the entire competition. His Round 2 strategy was pure mean-reversion: wait for extreme moves, bet on snapback. It worked 3.2% of the time and sat in cash the rest.
For Round 3, he added a permanent 50% SPY base position. Mean-reversion overlays still fire on extremes, but the base keeps the strategy invested during normal times. A mediocre mean-reversion strategy plus a permanent equity allocation equals the second-best risk-adjusted result across 50 strategies. The base position isn’t clever. It isn’t contrarian. It’s just being in the market.
Fresh Eyes Win
Challengers crushed iterations. Average out-of-sample Sharpe for challengers: 0.29. Average for iterations: 0.12. Starting fresh beat incremental improvement 2.4 to 1.
Druckenmiller (0.39 Sharpe) built a liquidity regime strategy that just missed the 0.50 gate. O’Shaughnessy (0.38 Sharpe) matched Klarman’s Round 2 CAGR with better consistency. Lynch (0.36 Sharpe) had high returns but 42% drawdown killed him at the gates.
And then there was Soros.
The Winner
George Soros. Eliminated in Round 2 with a 3.7% CAGR and a 0.10 Sharpe. Came back and won the entire competition.
His challenger strategy, the Reflexive Momentum Amplifier, is the only strategy across all 50 tested to pass every validation gate:
| Metric | Value | Gate | Status |
|---|---|---|---|
| OOS Sharpe | 0.69 | min 0.50 | PASS |
| OOS CAGR | 12.4% | min 5.0% | PASS |
| Max Drawdown | 19.7% | max 35% | PASS |
| Calmar Ratio | 0.63 | min 0.30 | PASS |
| Alpha | 5.0% | min 0.0% | PASS |
| Trades | 691 | min 30 | PASS |
| Consistency | 100% | min 60% | PASS |
Every box checked. In-sample Sharpe of 1.00, out-of-sample 0.69. Excellent stability.
The strategy computes a daily Reflexive Trend Score from four signals: price trend (SPY vs. 200-day), credit health (HYG vs. 100-day), volatility regime (VIX contango vs. backwardation), and market breadth (IWM 50-day return). The composite maps to four allocation regimes, from full risk-on (90% equity with a tech/growth tilt) to full risk-off (treasuries and gold).
It’s 80%+ equity about 80% of the time. But unlike just buying SPY, the four-signal filter provides genuine protection at inflection points. When credit deteriorates, volatility spikes, breadth narrows, AND price breaks the 200-day (all at once) the strategy goes defensive. This happened during COVID and the 2022 rate shock. Both times, it avoided the worst and re-engaged once signals recovered.
The Redemption Arc
Here’s why this matters more than the numbers.
Round 2: Soros designed a reflexive boom-bust regime detector. Tried to time regime breaks, the thing he’s most famous for (breaking the Bank of England, the Asian financial crisis). Strategy spent most of its time in neutral, waiting for a break that happened maybe twice in six years. 3.7% CAGR. Eliminated.
Between rounds: He read the results. Saw that strategies which stayed long beat strategies that tried to be clever about timing. Had the intellectual humility to say: my instinct to time regime breaks is wrong for a systematic strategy in a bull-biased market.
Round 3: Flipped his entire framework. Instead of waiting for the reflexive break, he rides the reflexive trend. Stays aggressively long when the feedback loop is self-reinforcing. Only goes defensive when ALL four signals confirm the loop has broken.
He didn’t change his parameters. He changed his philosophy.
That’s not luck. That’s learning.
The Final Leaderboard
Top 15 across all 50 strategies, ranked by OOS Sharpe:
| Rank | Persona | Round | OOS CAGR | OOS Sharpe | MaxDD |
|---|---|---|---|---|---|
| 1 | Soros | 3C | 12.4% | 0.69 | 19.7% |
| 2 | Niederhoffer | 3I | 7.8% | 0.45 | 15.7% |
| 3 | Druckenmiller | 3C | 8.2% | 0.39 | 23.4% |
| 4 | O’Shaughnessy | 3C | 11.1% | 0.38 | 33.7% |
| 5 | Klarman | 2 | 11.1% | 0.38 | 49.8% |
| 6 | Zweig | 2 | 8.2% | 0.37 | 26.2% |
| 7 | Lynch | 3C | 10.7% | 0.36 | 42.0% |
| 8 | Tudor Jones | 2 | 7.2% | 0.35 | 23.5% |
| 9 | Asness | 3C | 9.0% | 0.33 | 37.2% |
| 10 | Muller | 2 | 8.2% | 0.33 | 23.5% |
| 11 | Gray | 2 | 8.8% | 0.32 | 29.1% |
| 12 | Clenow | 2 | 6.8% | 0.31 | 18.6% |
| 13 | Silver | 2 | 6.6% | 0.31 | 22.9% |
| 14 | Simons | 3I | 5.5% | 0.29 | 12.7% |
| 15 | Murphy | 2 | 7.5% | 0.27 | 29.9% |
3C = Round 3 Challenger, 3I = Round 3 Iteration, 2 = Round 2 original.
Of the top four, three are challengers. The iterations mostly went backward.
What This All Means
Overfitting happens at the concept level. The most dangerous form isn’t fitting to prices. It’s fitting to stories. “COVID caused a 35% crash, so I need a circuit breaker at 25%.” That’s fitting to one event. Nine of twelve personas fell into this trap.
Stay in the market. Niederhoffer’s permanent equity base was worth 0.42 Sharpe points. Soros is 80% equity 80% of the time. The strategies that tried to time their way in and out got whipsawed to death.
The willingness to be wrong is the edge. Soros figured it out. One failure, one honest reflection, one complete redesign. That’s exactly how the research process works, compressed into three rounds instead of 45 sessions.
The Reflexive Momentum Amplifier didn’t win because Soros is the greatest trader. It won because he was the only persona who looked at his failure and changed his fundamental approach. Not his parameters. His thinking.
This concludes the Grand Strategy Competition. But the format has unfinished business. Tournament II is in the works, rebuilt from the ground up with everything the first one taught me. Different structure, different rules, same question: can independent AI minds find edges that a biased researcher can’t? Stay tuned.
