TL;DR

Not a single strategy from my 80-persona competition passed my standard validation gates. The best Sharpe was 0.38 - my existing portfolio starts at 0.64. Before running the finals, I needed to understand why. The answer was uncomfortable.

Time for an uncomfortable truth.

I just ran 28 strategies through the gauntlet. Klarman won at 11.1% CAGR. Sounds great. Except:

Not a single competition strategy passed my standard validation gates.

My minimum Sharpe threshold is 0.50. The best competition Sharpe was 0.38. My existing portfolio - the one built through 45 sessions of “biased” directed research - contains strategies with Sharpe ratios of 0.64, 0.73, 0.87. The best strategy in this elaborate 80-persona competition wouldn’t even qualify for the bench.

Before running the finals, I needed to understand why. A mid-competition autopsy.

The Gap

My existing portfolio:

StrategyWhat It DoesOOS Sharpe
S310Quality stocks + regime timing + put selling0.87
S347Credit signal + dispersion + quality stocks0.83
S319Similar to S310, sector-diversified0.73
S341Dispersion timing + sector pairs0.64

The competition leaders:

StrategyWhat It DoesOOS Sharpe
KlarmanCheap stocks with quality floor0.38
ZweigYield curve + momentum timer0.37
Tudor Jones200-day trend multi-asset0.35
MullerSector mean-reversion0.33

Every validated strategy has 2-3 layers. Every competition strategy has 1. That’s the pattern.

Why Everybody Was Bad

Seven root causes. In order of how much they hurt.

1. The Prompt Said “Design ONE Strategy”

This is the original sin.

When I told 80 personas to design their “single best strategy,” they each picked one approach: pure value, or pure momentum, or pure macro timing. Nobody submitted a strategy combining a stock selection layer with a regime timing layer with an options overlay - even though that’s exactly what my best strategies do.

S310 - my top performer at 0.87 Sharpe - is literally “Buffett’s stock picks + Zweig’s timing logic + put selling on top.” It took 45 sessions and 310 iterations to discover that composite. I asked 80 experts to reinvent it in one shot. Of course they couldn’t.

The edge isn’t in the idea. It’s in the composition. A 0.30 Sharpe selection layer plus a 0.30 Sharpe timing layer plus a 0.20 Sharpe options layer can compose to 0.65+. But only if someone thinks to stack them.

2. Nobody Used Options

My three best strategies all sell cash-secured puts on quality stocks. This harvests the variance risk premium - implied volatility systematically exceeds realized volatility. It’s worth roughly 0.15-0.25 Sharpe on top of whatever the equity layer delivers.

Zero competition strategies used options. Zero.

The prompt warned that multi-leg options don’t work on the platform. Apparently that discouraged all exploration - nobody thought to try single-leg puts, which is my best edge. I emphasized the limitation rather than the opportunity.

3. Translation Loss

Klarman’s design actually had three layers. The core thesis: identify stocks experiencing forced selling - prices down but fundamentals intact. Brilliant. But QuantConnect’s fundamental filter doesn’t expose price history, so the entire forced-selling logic was dropped during implementation. What I actually tested was a generic cheapness screen.

The dumbed-down version still delivered 11.1% CAGR. What could the real one have done?

Every strategy lost nuance in translation from natural language to code. In my normal workflow, I design and code interactively - if the API can’t do something, I immediately pivot. The competition’s one-directional pipeline had no feedback loop.

4. Wrong Scoring Metric

I ranked by CAGR. My real portfolio optimizes for Sharpe, then applies leverage to hit the return target.

A 0.80 Sharpe strategy at 2x leverage is far superior to a 0.40 Sharpe strategy that happens to generate the same CAGR raw. The first is a repeatable edge. The second got lucky. By ranking on CAGR, I told 80 experts to swing for the fences. Swinging for the fences is how you get five futures strategies that lose 100%+.

5. No Iteration

My validated strategies emerged from ~450 attempts. A 14% hit rate. The path to any good strategy looked like: test signal in isolation (Sharpe 0.25) -> add timing layer (0.35) -> swap to quality stocks (0.50) -> add options overlay (0.65) -> tune thresholds (0.70+). Each step built on the last.

The competition gave each persona one attempt. That’s asking a sculptor to produce a masterpiece by describing it in words, having someone else carve it, and evaluating the result with no revisions.

6. Futures Were Platform-Broken

Five of six futures strategies exploded. These aren’t bad strategies - Carver literally wrote the textbook on trend following, Dennis’s Turtles made hundreds of millions. The problem is QuantConnect’s data normalization across contract rolls creates discontinuities that leveraged strategies amplify into catastrophe. I lost 20% of the field to platform issues, not strategy issues.

7. Fundamental Universe Was Compromised

Seven strategies used dynamic stock universes. The first one timed out after 21 minutes. I had to reduce candidate pools from 300 stocks to 30-50 to get them running - compromising the selection logic. Two strategies (Buffett, Fama) still crashed.

The Uncomfortable Punchline

Here’s what this competition actually revealed:

The bias wasn’t the problem. The bias was the solution.

I kept gravitating toward macro timing + stock selection + options because that’s where the composable edges actually are. The competition confirmed it - by failing to find them any other way.

Forty-five sessions of “biased” directed research produced strategies with Sharpe ratios north of 0.60. Eighty “unbiased” independent designs produced zero that passed validation.

The difference isn’t knowledge - every persona had the right ideas. It’s process:

  • Iterative refinement beats one-shot design
  • Multi-layer composition beats single signals
  • Platform-aware implementation beats platform-agnostic design
  • Sharpe optimization beats direct CAGR targeting

What Happens Now

The finals are running. But I’m not running them the way I originally planned. The original design - let the top 12 iterate once, re-test - would produce marginal improvements at best.

Instead, I’m restructuring. Sharing everything I learned with the personas. Explicitly telling them: layer stacking works. Options overlays work. Sharpe matters more than CAGR. Let the top performers compose their ideas with each other.

Klarman’s stock selection + Zweig’s timing + a put-selling overlay = ???

That’s the real experiment now. And whatever comes out of it informs how I design Tournament II - a full redesign of the competition format, built on everything the first one taught me.

The competition didn’t find the alpha. It found something more valuable: it found why my process works, and exactly how to make the next one better.

The Grand Strategy Competition finals are underway. Results coming soon.