David M Rothschild on Posted on

Last week I got into a little Twitter fight with Nate Silver. I tweeted that I am concerned about the FiveThirtyEight forecast, both how much volatility it has and how predictable the forecast is. Here is the picture I tweeted:


He responded with this tweet: “Never seen otherwise-smart people in so much denial about something as they are about Trump’s chances. Same mistake as primaries, Brexit.” Then he went on a little tweet storm. I then responded. First, the short answer, is that FiveThirtyEight is claiming a toss-up, while I am claiming that Clinton has a small, but meaningful lead. 70-75% for the last 2 weeks. Here is summary of methodological disagreements with Nate Silver (who I find to be a very nice person and impressive data journalist):

Polling Data: there are two major poll aggregators that concentrate mainly on the poll aggregations, rather than modeling forecasts. Huffington Post’s Pollster and RealClearPolitics. They have different standards for what constitutes a legitimate poll. FiveThirtyEight includes everything that either one of them includes, and a lot more. It includes polls that are obviously not meaningful, such as Google Consumer Survey. Despite working for Microsoft, I love Google Consumer Survey as polling platform, but the poll they produce is not meaningful. It is done using soft-paywalls (ads that pop-up to block access to content) that people just answer with a lot of noise, Google inputs the demographics rather than asking them (again with a lot of noise), and it only rakes to age and gender (when we can assume differential nonresponse on race, education, party identification, etc). In short, no one takes it seriously, so why does FiveThirtyEight include it?

Poll Aggregation: RealClearPolitics takes a simple average while Pollster does a local linear trend, FiveThirtyEight make a series of “adjustments”. My worry is that these adjustments are poorly identified. They over-weight polls with large sample sizes; there is no positive correlation between sample size and accuracy. This is because normally big sample sizes are created to compensate for bad samples! Hence, Google is the highest weighted poll in their sample, with their 20,000 responses. They use their house effects (historical differences from mean) which seem to move most polls against Clinton. Why not just average the polls? I can go on, but their national polling average has been consistently 1-2 percentage points in favor of Trump relative to the other poll aggregators.

Forecast (State): starting on a state-level let us look at New Hampshire. PredictWise has New Hampshire at 82% for Clinton, while FiveThirtyEight is 63.5%. First, FiveThirtyEight includes Emerson’s poll, which is landline only. Second, it moves Marist from Clinton +2 to Trump +2 for its house model. Even so, its adjusted scores are Trump +2, Clinton +: 8, 4, 3, 3, 7, 5, 3, 4 through August. That seems like a pretty strong run for a candidate with just 64% to win. The other key forecasters: NYT Upshot 85%, PEC 98%. More ridiculous state is Rhode Island at 80% Democratic due to an Emerson poll. Everyone else has it at solid Democratic state. FiveThirtyEight assumes massive, correlated errors within state that are not reasonable.

Forecast (National): FiveThirtyEight is currently at 58% while NYT is 71%, PEC is 82%, and PredictWise is at 70%. A lot of this has to do with the polling choice, aggregations, and forecasts. But, some of it has to do with how FiveThirtyEight aggregates its forecasts. It assumes that the national polls (and all state polls) have a lot correlation with all of the states (reasonable), but that the potential error is massive and correlated between the states; again this assumption is not reasonable This is how they are create probability distributions of outcomes that are basically flat. They give non-negligible probability to just about any outcome:


The PredictWise probability distribution is much more reasonable. There simply is not a uniform possibility of all of these outcomes:


Three Forecasts (national): Polls-plus, Polls-only (default), and Nowcast. FiveThirtyEight is actually posting three forecasts. I referenced the poll-only above, because it is the default forecast. FiveThirtyEight stopped promoting the polls-plus model after the primaries, where it did worse than the polls-only. It has been using the Nowcast a bunch in analysis, but it is confusing. Silver admits that Clinton is up 2-3 percentage points nationally and by at least that in 270 Electoral Votes. So, if the election were held today, would he really say toss-up as the Nowcast says?

Volatility and Predictability: all of these forecasts have been circling PredictWise. The forecasts are moving several points per day, during the day, on single polls. It is not reasonable to assume the underlying probability of the election moves that much. But, more important, its larger sinusoidal movements are predictable and that is never good in a forecast. Especially around the conventions.

Get-Out-The-Vote: Clinton has an advantage in Get-out-the-vote. Everyone knows that. And, Silver seems concerned that markets are incorporating that. But, I think we would both agree that that is something that is impossible to model, but should be included if you have a market that can convert known data, that lacks historical precedent.

Spin: FiveThirtyEight is magazine run by journalists while I am blog run by a researcher. My goal is to get the more accurate forecasts out there. Its job is to sell advertisements. The toss-up spin is much sexier than the solid, but small Clinton lead. And, if Trump wins, they will claim a great victory. If Clinton wins, they will claim a smaller share of a pie, that is already going to be shared by many people.

Spin (note after publication): I do not believe that FiveThirtyEight altered their model to get one answer or another. They certainly felt burnt after primaries and added expectations of error into their model, which would add to volatility and conservativeness of their forecast, which would serve to turn at close, but meaningful lead, into a “toss-up”. But, I meant this comment more on their public interpretation of their model. Forecasters, especially ones as famous and well-renowned as Nate Silver, are not judged on their Bier score or calibration, but their salient wins and losses. Thus, maximizing their return, they should focus on the narrative they spin as much or more than the accuracy and calibration of their model.