In 2016 I worked with two very different types of data sources: public and private. I used public polling and prediction markets data to predict the outcome of the 51 Electoral College (46 or 47 of 51), 34 Senatorial (31 or 32 of 34), 435 House (TBD), and 12 Gubernatorial elections (10 or 11 of 12). That was published on PredictWise; all of the final election eve predictions are here. Together with Sam Corbett-Davies and Tobias Konitzer, I also ran two major experimental polls using two interfaces: a display poll on MSN and a mobile-only poll on Pollfish. I used both of these data sources to discuss support of public policy and quick reactions to unfolding events. The idea was that in 2016 I would study the data collection and analytics to nurture new processes, answering pressing questions now, and then use them for election forecasts rather than tradition polling when it was tested. It was my mistake to trust the public polls and disregard the private polls; the MSN (45 or 46 of 51) polling persistently pointed to Trump victories in Michigan and Wisconsin, while the Pollfish (45 or 46 of 51) polling persistently pointed to Trump in Pennsylvania. While the binary accuracy was similar to top public polling, my two experimental polls consistently pointed to Clinton’s trouble in the rust belt with: a more Trump leaning voter population and more support for Trump from key demographics.
The amount of binary misses in the public polling is not extraordinary, but the public polling missed three critical states: Michigan, Pennsylvania, and Wisconsin. This was the backbone of Clinton’s so-called firewall. Florida, North Carolina, and Ohio were insurance; if she won any of them she would win the election, but she did not need to win them to win. As I told the New York Times on September 16, the entire election really came down to whether or not she could hold Pennsylvania. If she won Pennsylvania, should probably carry along enough other similar states to cover 270 Electoral Votes, if not, Trump would have won enough similar states to carry the election.
Going into Election Day, the poll aggregator Pollster had Pennsylvania at +5.2% for Trump, RealClearPolitics at +1.9%, and my simple average of the last three weeks of polling was +4.6% for Clinton; this is a big, but not insurmountable lead (as my research here shows). RealClearPolitics was less bullish for Clinton, because it uses polling from Republican organizations such at Trafalgar Group and Harper, which rely on landline-only automatic calling (which is illegal to do for cell phones). Without them, the RealClearPolitics average would be +3%, which is big, but not insurmountable. Wisconsin and Michigan were a little more secure for Clinton with Wisconsin at +6.5 for Clinton in both Pollster and RealClearPolitics, and Michigan at +6.2% and 3.4% respectively. With this data, PredictWise’s forecast was 94 percent for Pennsylvania.
Without seeing the internals of these polls, I am comfortable with how the prediction markets (which form the backbone of the PredictWise forecast) aggregated this into prices and I combined them into a 89 percent probability of victory for Clinton going into the election. Markets were able to disregard the polling in Nevada and predict a strong victory for Clinton based on early voting, which differed dramatically from the polling. In the toss-up states of Florida, North Carolina, and Ohio, and the Clinton firewall states of Michigan, Pennsylvania, and Wisconsin, PredictWise was between the polls-only forecasts (e.g., Sam Wang’s PEC and Pollster) that were very extreme and FiveThirtyEight (opaque mixture of polls and something) that was more conservative on most forecasts. To be clear, this meant that for Clinton leaning states, that Clinton lost, the polls-only were more “wrong” than me and FiveThirtyEight was less “wrong”, and on Trump leaning states that he won, the reverse was true. Nate Cohn’s New York Times’ Upshot was most similar to PredictWise’s forecasts. This translated into the overall forecasts, where PredictWise was more conservative than PEC (>99%), Pollster (98%), similar to NYT (85%), and more bullish than FiveThirtyEight (71%). My 11 percent likelihood of Trump winning, was my empirically-derived probability that public polling would have an error that was (1) historically large (2) correlated (3) in Trump’s favor. Which, obviously, occurred through Ohio, Wisconsin, Pennsylvania, and Michigan.
Polling does two things: estimate the voter population and the support for each candidate from the voter population; relying on other people’s public polling it is very hard for a forecaster to estimate where they will go wrong. For public polling data, someone has run 500 to 1,000 responses and all I get to see is a few polished data points. I do not get to see the decisions that determined the voter population and the support they will give each candidate. My research has frequently emphasized concerns over the likely voter population: can single polls really define the likely voter population as well as we can model it from historical data? Were they missing lower education white voters? Without seeing the individual responses, I had no disciplined way to correct for that. Were polls underestimating support for Trump from Republicans or higher educated white women? Without seeing how they were weighting the individual responses, I had no disciplined way to correct for that. And, how correlated were the potential sources of error across the states? I do not want to ever again be in the position of forecasting a combinatorial outcome space w/o access to the raw underlying data.
The voter population that my colleagues and I created for our own polling more closely resembled the true voting population than the estimates from the public polling. This is best demonstrated by Nate Cohn’s interesting experiment of giving four different pollsters (and himself) the exact same polling data and see what topline numbers they generate. We certainty absorbed some of Latino population into our estimate of White votes (i.e., we had too few Latino and too much white, compared with the Florida exit polls, for whatever exit polls are worth). But, we ultimately had a more Republican/Trump make-up state-by-state: older, whiter, and less educated, than the what the polls estimated. That is why for the same sentiment data, our projections were constantly more Trump.
Further, we had support levels for Trump and Clinton that closely resembled the exit polling. We consistently had Republicans supporting Trump at a rate of 2 pp more than Democrats supported Clinton. We had surprisingly strong support from college educated whites, already controlling for age and party identification. White males supported Trump at 64 percent and white females at 52 percent. We had Hispanic support at 75 percent, a few points higher than the exit polls, but, more crucially for the rust belt, we estimated 85 percent support for Clinton from African-Americans. In-line with the exit polls and below the 2012 numbers for Obama. All of this is backed up by the polling data and why our estimates on both Pollfish and MSN were throwing rust-belt states towards Trump.
I disregarded my experimental polling and went with the public polling and markets (which, by Election Day, largely rely on public polling); it seemed the responsible thing to do ex-ante. There is a long track-record of accuracy in traditional polling and my experimental polling was designed to replace it in the future. Just look at the September map we had up on Pollfish. We had a 0.5 pp lead for Clinton in the popular vote, but had her losing: FL, NC, PA, OH, WI, and tight in MI. While she ultimately carried NH and ME this map is quite impressive. Big I kept it hidden the Pollfish site, because I did not want to look foolish:
But, more important for the industry, I think that it is a mistake to work with just the topline results from public polling. Without seeing the individual-level data, it is impossible to really address the concerns of the voter population and their support. I look forward to nurturing the next generation of polling and concentrating on predictions from that data. Because, at the end of the day, I would prefer to see 100,000 individual responses than 10,000 topline numbers from a 1,000 respondents each (10,000,000 total responses). With 100x less responses, I can see so much more about how the population feels about a topic, who will act on it (e.g., vote), and how those estimations correlate among different demographics.
Expect to hear a lot less of me over the next few months as I get back to the science of data collection and analytics, and spend less time discussing the results publicly. I am so thankful for all of the people that followed my work, provided feedback, and I hope that you do not walk away from learning and engaging with public opinion data because of this outcome. Now, more than ever, we need a population that understands each other, how we feel about different topics. And, if you think of this in-terms of market intelligence in general, we need to develop strategies that focus on capturing not just people on average (like the national polling), but really can understand detailed demographics (like White/non-college/Wisconsin). That is not just important for elections, but in an increasingly personalized world, for all marketing.