Condorcet's Jury Theorem: When More Agents = Better Decisions

And when it breaks down

Panel 1

The Classic Theorem

Condorcet proved in 1785 that if each juror independently has probability p of being correct on a binary decision, the probability that the majority is correct grows rapidly with the number of jurors — provided p > 0.5. Below p = 0.5 the theorem reverses: more jurors means a worse collective decision. At exactly p = 0.5 the group is no better than a coin flip, no matter how many jurors you add.

P(majority correct) = Σk=⌈n/2⌉n C(n,k) · pk · (1−p)n−k
0.65
Even a slight edge above chance (p = 0.51) is enough: with 101 jurors the majority is correct ~58% of the time, with 501 it rises to ~67%. At p = 0.7 only 15 jurors push the majority above 95%. The catch: this assumes independence — an assumption that fails badly with correlated AI agents.
Panel 2

The Correlation Problem

When agents share the same LLM backbone their errors are correlated — they tend to be wrong about the same things. We model this with a correlation parameter ρ (rho). At ρ = 0 agents are fully independent (classic theorem). At ρ = 1 every agent gives the same answer — your committee of seven is really just one voice repeated. The effective number of independent agents is neff = n / (1 + (n−1)ρ). Even small correlations (ρ = 0.3) collapse a panel of 31 agents down to roughly 3 effective independent voters.

neff = n / (1 + (n−1)ρ)     then apply CJT with neff
0.70
0.30
This is the key insight for multi-agent AI systems. Using 7 instances of GPT-4 is NOT like having 7 independent jurors. LLMs trained on overlapping data with similar architectures share systematic biases. To get genuine diversity you need different model families, different prompting strategies, or structured disagreement protocols. The chart above shows that at ρ = 0.5, a 31-agent panel performs no better than about 2 independent agents.
Panel 3

Weighted Voting with Heterogeneous Competence

In practice agents are not equally competent. A specialist model might have p = 0.85 on its domain while a generalist sits at p = 0.60. Simple majority treats them equally. Weighted majority assigns each agent a weight proportional to its log-odds: wi = log(pi / (1−pi)), which is the information-theoretically optimal weighting for independent binary voters. The chart below runs 50,000 Monte Carlo simulations comparing the two strategies for 7 agents with competences spread between pmin and pmax.

wi = ln(pi / (1−pi))     decide correct if Σ wi vi > 0   (vi = +1 or −1)
0.55
0.85
When agents have similar competences the two strategies are nearly identical. But as the spread grows, weighted voting pulls ahead — sometimes dramatically. This matters for multi-agent architectures: if you know which agent is stronger on a given task, trust it more. In practice this means tracking per-agent accuracy and using calibrated confidence scores rather than raw majority.