Logarithmic Scoring Rule: How Markets Reward Accuracy | Probability & Statistical Literacy

The Brier chapter squared forecast error. Many prediction-market mechanisms—especially LMSR from the order-book and AMM material—reward beliefs with logarithmic utility instead. Same honesty incentive, different geometry: small mistakes on unlikely outcomes get punished harshly.

Log score definition

For binary forecast f on YES and outcome o ∈ {0,1}: if o = 1, score S = ln(f); if o = 0, S = ln(1 − f). Equivalently S = o·ln(f) + (1−o)·ln(1−f)—higher is better.

Forecast 80% when YES hits: ln(0.80) ≈ −0.223. Same 80% when NO hits: ln(0.20) ≈ −1.609. Near coin flip 51% either way ≈ −0.673. 99% wrong: ln(0.01) ≈ −4.605. Never assign f = 0 or f = 1 unless you want catastrophe in the log.

Log versus Brier

Both are proper scoring rules—truthful reporting is optimal in expectation. Brier uses squared error; log uses logarithmic surprise. Tail miss on a 99% call: Brier ≈ 0.9025; log ≈ −2.996—log crushes overconfidence on long shots, the failure mode in political “lock” narratives.

Feature	Brier	Log score
Penalty shape	Squared distance	Log surprise
Typical homes	Tournaments, meteorology	LMSR, information markets
Tail 99% wrong	Very bad	Even worse in relative terms

Track Brier and log on the same journal; divergence flags tail overconfidence.

LMSR connection

LMSR market makers use a cost function tied to the log partition of share counts. Traders move prices by buying outcomes; the maker’s worst-case loss is bounded by liquidity parameter b; at the margin, displayed price ≈ implied probability when the pool is deep enough.

You need not memorize the formula to trade—but know log markets punish extreme allocations more than moderate Brier penalties on mid-range errors. Slippage on AMM curves is paying to shift log-implied odds.

Accuracy means calibrated probability mass, not “called YES correctly.” Buying YES when chance is high pays less in expectation; pushing f to 0.99 on thin evidence yields tiny gain if right and huge log loss if wrong. Liquidity providers earn fees but absorb mispriced log insurance if their side is wrong. Arbitrageurs align LMSR f with CLOB c when rules match—the pool does not “know” the other venue.

Worked comparisons

Bill passes; true chance 40%; resolves NO. Trader A at f = 0.42 with small NO hedge: ln(0.58) ≈ −0.544. Trader B at f = 0.90 with loud YES: ln(0.10) ≈ −2.303. B might still profit if they bought YES at 35¢ and sold 80¢ before collapse—log score judges stated beliefs, not trading alpha. Journal both.

Kelly maximizes expected log wealth in classic derivations; log scoring is the one-shot analog for probability reports. Fractional Kelly hedges misspecified p the way capping f away from 0 and 1 hedges log score.

Four cabinet picks with forecasts 0.40, 0.30, 0.20, 0.10; outcome #3 wins: S = ln(0.20) ≈ −1.609. Flat 0.25 each: ln(0.25) ≈ −1.386—less wrong despite no sharp peak. Categorical LMSR pools feel flat until flow concentrates.

Trading log-scored venues

Never treat 1¢ or 99¢ as free options. Simulate AMM fills—your trade moves f along the curve. Compare log-implied f to CLOB c before arb. Cap stated confidence near 0.92 unless resolution is essentially locked and oracle risk is priced. LP is selling log-scored insurance—know b and tail exposure. Post-mortem log score on f at trade time, not exit price.

Oracle reporters weighted by historical log or Brier performance face proper-score incentives: lazy 50/50 is mediocre; bribed certainty loses bonds on NO resolutions. Trading and reporting diverge—you can trade without publishing f, but merged roles inherit log economics.

Pool starts 50/50; you buy YES until f = 0.60. Early dollars are cheap on the log curve; later dollars cost more implied probability. Your average fill, not starting mid, is the honest forecast for scoring and for “what the market believes” claims.

Misconceptions

Log score hates misstated probabilities, not risk with edge. Brier and log rank similarly often but diverge on tails. AMM price is equilibrium given flow and b, not ground truth.

Why tails dominate log scores

Assigning 1% when the true chance is 10% feels modest on Brier but brutal on log if YES hits—you assigned negligible mass to the realized world. That is why LMSR prices resist sitting at 0% or 100% without enormous flow: the mechanism encodes log geometry. Traders should mirror that humility in published f, not only in pool prices.

LP as selling insurance

Providing liquidity on a log-scored pool is selling tail insurance to informed flow. Fees must compensate for the log penalty when you are wrong on a long shot. If b is low, prices move cheaply—manipulation and noise move f without deep information. Module 03 on manipulation cost pairs with this chapter on scoring geometry.

One journal, two scores

After resolution, compute Brier and log on the same timestamped f. If Brier looks fine but log is awful, you are probably overconfident in tails—exactly the bias that breaks Kelly and political “lock” trades. Fix tails first.

Moving an LMSR price (narrative)

A pool starts at 50/50. Your first $20 of YES might move displayed probability to 52% cheaply; the next $200 might move it to 60% expensively because the log cost curve steepens. Your average implied belief is between those prints—not the pre-trade mid. Module 02 on slippage and this chapter on log scoring describe the same phenomenon in engineering and incentive language.

Reporting and trading divergence

You can trade size without ever publishing a public f. Oracle reporters and tournament entrants cannot. When roles merge—staking reputation on resolution—log penalties bite dishonest certainty. Traders who only flip contracts still face log geometry inside AMM prices even when they never submit a score.

Bits intuition

Log score is related to surprise in information theory: assigning tiny probability to what happened means a large negative log. That is why a single 99% miss hurts careers on leaderboards more than a string of 60% misses. Train yourself to widen tails when resolution risk is real—litigation, oracle dispute, ambiguous rule text.

Hybrid venues

Some products show a CLOB mid while a pool backs residual liquidity. Log scoring describes the pool arm; Brier may describe your public tournament arm on the same event. Know which mechanism sets the price you trade and which score judges your published forecast.

Practical cap on confidence

Desk rule of thumb: public f above 0.92 requires written resolution paragraph and second reviewer. Traders without a desk should adopt a personal cap. Log scoring makes the reason obvious—one wrong tail dominates a year of scores.

Summary

Log score rewards honest f, punishes tail overconfidence, and explains AMM price geometry. Trade for P&L; score for calibration; use both to tune Kelly and narrative risk.

What comes next

Next: correlation and joint probabilities—when events move together and your portfolio is one bet.