On Tuesday the Monkey Cage at the Washington Post published an article, “Do betting markets outperform election polls? Hardly.” The article suffers from two common confusions around both prediction markets and forecasting in general. First, election polls are a reflection of the voter intention on the day the poll is taken, while prediction markets are a reflection of the probability of victory on Election Day. These are two related, but different value types. Second, nothing is 0% or 100% to happen until it happens, before that point there is an unknown underlying probability of occurring that can and should fluctuate over time; that is the value that prediction markets are trying to estimate.
The first chart is supposed to show that polls and prediction markets are similarly accurate in predicting the outcomes of the last five contested primary elections; instead it shows that voter intention and probability of victory fluctuate across the primaries. The chart illustrates Huffington Post’s Pollster poll trend for the last five contested primaries next to the probability of victory derived from the candidate’s prediction market and bookie prices (the author does not note how he converts share price to probability victory, but I would suggest reading this paper; this is a subtle, but important process). The chart makes both errors noted above: it implies that both value types should be correlated with probability of victory and that the ultimate winner was always the most likely. The author fails to distinguish that the polling average is meant to show the current support for candidates, so it is not wrong when it shows 11 lead changes in the 2012 Republican primary. There were points in time when Rick Santorum, Newt Gingrich, etc., had more support than Mitt Romney; the polls are not meant to be predictions of the final outcome. Similarly, prediction markets are not necessarily wrong for assuming that Donald Trump was not the most likely candidate to win the Republican primary in July of 2015. Just because he did ultimately win, does not mean he was 100% to win the entire time (i.e., we do not know the true probability of Trump winning the primary on July 1, 2015, but it was well less than 100%).
Beyond the overall problem of defining accuracy, the first chart and its analysis is seriously flawed: it is very misleading to chart two different value types next each other with the same y-axis, a leading indicator is not always visible with the naked eye, and five outcomes does not make a scientific analysis. First, the author charts two very different values types, that both run from 0 to 100, next to each other with the same y-axis. This could lead to some readers thinking they were indexing the same things. Again, the polls are current voter intention or support and the prediction markets are probability of victory. Second, just because the author does not see one leading the other with the naked eye, does not mean it is true. I would suggest some form of time-series regressions (again accounting for the different value types). Third, this is just five outcomes and basically hinges on Trump winning. If Trump had lost (which I contend was a non-negligible probability for most of the run-up to the voting), polls would have been very bad a predicting (but, again, possibly correct at reflecting the voter’s support for Trump at that point in time).
Prediction markets should, and do, have very little value in predicting vote share; the second chart does not recognize this fundamental aspect of a probability relative to a poll share (I demonstrate that here). If a candidate is going to win by 10 or 20 points, she is going to have a 100% probability of victory, but will hopefully be 10 or 20 points up in the polls respectively. It is possible for a candidate to be likely to win by just 2 or 3 percentage points, but the markets show 100% if there is a lot available information on the electorate. Further, a candidate may be likely to win by 10 points, but with low information, the markets on go to 70% or 80%. A prediction market, built on the binary outcome of winning or losing, is not designed to show by how much. Polling is designed to show that. Thus, it is completely unsurprising that polling is more predictive of vote share than prediction markets.
Prediction markets and polls both did well at their respective tasks this election cycle; but, prediction markets were more useful to most people. First, the election eve accuracy was similar for the contests that both did. Second, prediction markets covered about 50% more contests (smaller races and caucuses, probably hard to predict) and were similarly accurate in these extra contests. Third, prediction markets start much earlier on average; this is especially important when studying how early contests affect later contests. Fourth, prediction markets (represented by my site, PredictWise) avoided the major errors of poll-based forecasts like FiveThirtyEight.
I spend a lot of time studying both polls and prediction markets. Researchers should understand that these are two different ways of engaging people with questions about elections, different: audiences, questions, incentives, and aggregation. These differences should be examined for what they are, which is a great tool for us to learn about data collection and aggregation in its many forms.