Last week I started working with MSN on their new polling interface. Along with many others, including some of my regular collaborators, I am very excited by the challenge of helping to take data from the standard internet poll (fully opt-in and open to all) and making it both meaningful and interesting. We are going to do this with a mix of two strategies: question design and analytics.
1) Question Design: ensures that we ask the right question, the right way, to the right people. This means reviewing each question for optimization within an opt-in poll. And, it is not just the words, but the interface and the information we show before, during, and after the poll. Finally, we consider whether when it is appropriate for different respondents to see different questions, adapting the questions based on earlier answers.
2) Analytics: take the raw data and transforms it into something useful by taking advantage of all of the data in the poll and provided for by the respondents.
On Monday and Tuesday of last week we ran our first poll. It was five questions long. The questions were in order (we will get randomization soon) and we got over 770,000 answers to the first question and over 135,000 full polls:
1) Who do you think will win the presidential election in your state? Democratic candidate (either Clinton or Sanders), Republican Donald Trump, Other candidate, Not sure
2) Who will you vote for president? Democratic candidate (either Clinton or Sanders), Republican Donald Trump, Other candidate, Not sure
3) What is your age group? <18, 18-29, 30-44, 45-64, >=65
4) What is your gender? Male, Female, Prefer not to say
5) What party do you identify with most? Strong Democrat, Weak Democrat, Other/Independent, Weak Republican, Strong Republican
Here is a screenshot of the voter intention question, both before (left) and after (right) answering:
With, nationally, 55% of respondents professing their intent to vote for Republican Donald Trump and just 31% of respondents stating their intent to vote for Democrat Hillary Clinton (or Democrat Bernie Sanders), it is unsurprising that this imbalance carries over into the state-by-state raw voter intention. A quick look at the three demographic questions explain why. Where estimates from the voter file indicate that about 40% of voters will be 45-64, the MSN poll had 53%, and 30-44 is underrepresented with 14% of MSN poll versus 21% of likely voters. Even more concerning is the 61% of men, which make up about 46% of the voting population, and 53% Republicans and 28% Democrats, where the voting population is closer to 33% and 39% respectively.
We asked the expectation question for a reason. Without any fancy analytics below is the map of raw, average voter expectation of who will win each state – raw, average voter intention of who respondents in that state said they will vote for. With this metric, North Carolina then Florida are the two leaning Democratic states, followed by Ohio, Iowa, Nevada, New Hampshire, Virginia, and Pennsylvania, as slightly more secure. In a binary sense, this matches up completely with PredictWise’s state-by-state predictions, with the exception of North Carolina, which MSN has as the least strong state for Clinton and PredictWise has as the least strong state for Trump with just 62%.
Beyond question design, there is analytics. We model the answers to both the voter exception and intention question based on the: age, gender, party id, and geographic division. Then, we post-stratify that model onto the likely voter population. Below is the voter expectation question, modeled and post-stratified. This is not perfect, we are still working out the models, but again we have nearly perfect symmetry with PredictWise on the state-by-state binary outcomes.