Who Are Superforecasters? The Good Judgment Project | The Science of Superforecasting

The chapter on identifying mispriced markets gave you a repeatable way to hunt for trades where price and belief diverge. That framework assumes your belief is worth betting on. Superforecasting is the discipline that makes that assumption defensible: turning forecasts into scored probabilities, comparing them to outcomes, and learning who actually gets the future right.

The modern story starts with Philip Tetlock and the Good Judgment Project (GJP). In the early 2010s, U.S. intelligence research funded large geopolitical forecasting tournaments. Thousands of volunteers answered thousands of binary questions—cabinet changes, coups, policy moves—with explicit probabilities and deadlines. Tetlock's team showed that a slice of participants, later called superforecasters, beat many professional analysts on the same scoreboard, and that skill was not a one-year fluke.

What the tournaments measured

Forecasting tournaments are not opinion polls. Each question has a clear resolution date and rule. You submit a probability, lock it, and later the world resolves YES or NO. Performance is graded with proper scoring rules—especially the Brier score, which punishes confident wrong answers more than humble ones.

Calibration asks whether your 70% forecasts happen about seven times in ten. Resolution, or sharpness, asks whether you use the full ten-to-ninety percent range when the evidence warrants it. Persistence asks whether top performers stay strong in year two. Prediction markets add continuous prices, fees, and slippage on top of that scoreboard. A forecaster can have excellent calibration and zero profit if they never trade; a trader can get lucky once with reckless ninety-percent internal confidence. Durable edge in markets needs both honest probabilities and positive expected value after execution costs.

Who superforecasters are—and are not

Superforecasters are not oracles with secret data. In Tetlock's research they were roughly the top few percent of tournament performers who, in aggregate and often individually, updated beliefs incrementally, respected base rates, stayed granular about definitions and timelines, and separated what they knew from what they had merely read.

They are also not the loudest pundits on television. Public expertise often rewards bold narratives and slow public revision. Tournaments reward pre-registered numbers that can be scored. Good Judgment Open and similar platforms let the public practice the same style: lock a probability, wait for resolution, see your Brier. For prediction-market traders, the lesson is not "join a website"—it is score yourself the way research scores forecasters.

Weak forecasters jump from vague intuition to zero or one hundred percent, rewrite history after the fact, and treat disagreement with the crowd as proof the crowd is stupid. Strong forecasters ask what the crowd might know that they do not—then either update or document why they still disagree.

Superforecasters versus market sharps

A GJP-style forecaster outputs a probability f by a deadline. A prediction-market trader outputs bids, offers, and position size. The scoreboards differ—Brier and calibration curves versus profit and loss—but the cognitive habits transfer: incremental updates, explicit base rates, disciplined disagreement with consensus.

Role	Primary output	Primary score
Tournament forecaster	Locked f by deadline	Brier, calibration
Market trader	Orders and size	P&L, drawdown
Ideal retail hybrid	Both locked f and trades	Brier and net EV

Markets embed a crowd probability you can read from price. That consensus is a powerful outside view, but it is not automatically right. Your job is to know when your f differs from executable price for reasons you can defend, not because a narrative feels compelling.

The bridge is the trade journal: record f at entry, never edit it after resolution, and compare to venue price and outcome. That is how you learn whether your mispricing hunts are skill or theater.

A minimal scored example

Suppose you lock f = 0.64 before a central-bank meeting: "policy rate cut at March meeting." The market mid is $0.62. The event resolves YES.

Your Brier contribution is (0.64 − 1)² = 0.1296—a solid score for a correct call. If you had secretly treated the event as ninety percent likely but only wrote sixty-eight percent in your journal, you are unscoreable and you will mis-size the next trade. If you bought YES at fifty-eight cents and made money, P&L celebrates execution; Brier celebrates honesty.

Profitable trades with bad calibration are rehearsals for a blowup. Excellent calibration with no trades is still valuable—it means your model of the world is learning even when the book offers no edge.

Calibration bins without spreadsheet theater

Imagine you log forty binary forecasts at trade entry over a quarter. You bucket them: fifty to sixty percent, sixty to seventy, seventy to eighty, and so on. In the seventy-to-eighty bucket you said "about seventy-five percent" fifteen times and YES resolved ten times—roughly sixty-seven percent hit rate, not seventy-five. That gap is overconfidence you can fix before you scale aggressive sizing.

Profitability can look fine while Brier drifts. Superforecasting catches that drift early because the scoreboard is about frequencies, not feelings.

How this module connects to what you already learned

Foundations covered why crowds can beat average experts and how markets differ from polls. Probability modules gave Bayes, expected value, Brier, and fallacies. The trading module gave journals and bias awareness. The signals module gave consensus synthesis and mispricing audits.

This module sits on top: it makes your fair probability defensible before you size. Later chapters teach decomposition, outside versus inside view, Bayesian updates in practice, multi-lens aggregation, calibration drills, daily habits, why experts fail on leaderboards, and what training evidence actually supports.

Common mistakes even after a clean mispricing audit

Identity trading—forecasting the tribe, not the contract—destroys calibration because you are scoring ideology, not resolution. Scoreless confidence is huge size with a fuzzy internal story. One winning audit does not make you a superforecaster; you need dozens of locked scores.

Ignoring venue price while hunting narrative edge is another failure mode. The market is an input and a benchmark, not an enemy to defeat on cable.

Another mistake is conflating "I was right" with "my probability was good." A forty-cent YES that wins is a great trade and a mediocre forecast if you said ninety percent. Score the forecast you locked, not the story you tell after resolution.

Habits you can borrow without the app

You do not need a tournament account to copy the incentive structure. At trade entry, record f and do not edit it after resolution. Tag forecasts into bins and review hit rates monthly. Log market consensus beside your number so you can see when you systematically fight the crowd wisely versus stubbornly.

Run parallel drills on questions you do not trade—public forecasting sites, five binary questions per week with locked timestamps. Aggregation helps: averaging independent estimates from two thoughtful people often beats either alone, which is the same wisdom-of-crowds logic markets exploit when errors are partially independent.

Pair tournament discipline with platform reality: your edge lives at the intersection of calibrated belief and executable price after fees. Superforecasting without execution math is academia; execution without calibration is gambling with extra steps.

Tetlock's broader lesson for markets

Tetlock's earlier work on expert political judgment warned that many famous forecasters are overconfident and rarely scored. The Good Judgment Project was partly a response: if you measure forecasts, you can train them. Prediction markets are a parallel measurement device—prices are public, outcomes resolve, P&L is brutal feedback. The trader who ignores both scoreboards is choosing story over evidence.

That does not mean markets are always right. It means your alternative to consensus should be a number you are willing to defend in a journal, not a mood you are willing to defend on social media.

When superforecasting discipline says "no trade"

Elite forecasters pass on many questions. Passing is not missing edge; it is preserving calibration and bankroll for questions where your table, lenses, and economics align. If you cannot lock f before you see price, if your bins say you are overconfident in this range, or if net expected value is negative after honest shrink, the superforecaster move is shadow-logging and walking away.

What comes next in this module

You entered superforecasting from signal-reading and mispricing audits. The chapters ahead build the toolkit: decomposing problems into base rates and specifics, balancing outside and inside views, Bayesian updating in practice, multi-lens aggregation, calibration training, daily workflows, why experts fail in fox-and-hedgehog terms, and what the evidence says about learning these skills.

You should leave with trustworthy probabilities—decomposed, anchored, updated, multi-lens, and Brier-ready—before the blockchain module adds contract and oracle risks on top of event probability.

Key ideas to carry forward

Superforecasters are defined by scores, not fame. GJP showed calibration and persistence matter. Your journal is your personal tournament. Markets add execution; both scoreboards matter.

Read any live quote as two questions: is the market wrong, and is my probability good enough to bet on that judgment? This module trains the second question. Without it, mispricing hunts become clever stories with negative long-run Brier.

Next: Decomposing Problems into Base Rates and Specifics