Tobias Konitzer and David Rothschild on Posted on

There is a lot of discussion on the value of “big data” in creating actionable market intelligence. Hedge funds buy up all sorts of big behavioral data sets to gauge the growth of consumer-facing companies. This is not new, decades ago they counted cars in parking lots, moved onto satellite images, then onto caches of digital footprints as people move along the “purchase funnel” (from searching to buying anything). This is, without question, good for gauging real-time metrics on sales, but what about prospective dynamics? Can surveys provide value above this data, by adding in useful (1) representativeness and (2) ability to look forward?

This blog post is to introduce you to a novel experiment we are running with Pollfish (mobile-based polling platform). We are trying to assess the usefulness of survey research on questions of prospective economy and company growth. In contrast to YouGov or Morning Consult Brand Index, we do not use information on net promoter scores and other metrics regarding company visibility and retrospective purchasing behavior. Instead, we focus on prospective purchasing, and rely on a novel and superior polling methodology we have built out over the years.

NOTE: We are happy to make historic data available upon request. Please email. Just advise on what you think would be an interesting study with this data! Or take a look at our more detailed write-up. Click here.

Simple outcome, change in stock price for over next month: To assess prospective change in stock prices, we start by exploiting a single survey question. “Regarding shopping at X, do you plan to spend more next month than last month”. Our surveys also include a host of questions related to name recognition, likelihood to recommend, retrospective purchasing behavior and dislike, but we start with this one question. We obtain representative national estimates based on our MRP+ methods (described below). Our first tested outcome is to validate that subsequent changes in stock price are correlated with our survey measure, relying on a subset of the data from February to April 2017. Good start!

PredictWise20170714a

In plain English, we are looking at the economy and companies like we look at elections. We are forecasting purchase share from general population, rather than vote share. We want to understand politics for normative reasons, we need to these economic outcomes to provide us more diversified and regular outcomes to calibrate our polling. Further, like elections, these economic outcomes correlated in unique ways, providing an interesting challenge, that our models are built to address.

This methodology relies on a two-stage method:

1) Model: First, we model the raw responses to each question, given each respondent’s age, gender, location, education level, race, marital status, party identification, income, family size, urbanicity using state-of-the art machine learning. This information divides the population into thousands of demographic categories. We do not just use the current poll, but every time we have asked the questions and related questions. For each sub-group and poll question, we predict the percent of people that would provide each answer if the entire country showed up to the poll.

2) Target Population: Second, we project our estimates for each sub-group onto the US adult population, a process known as post-stratification. We derive the target population from a Big Data combination of population-level census data and proprietary financial and political data sets.

This method has some key advantages over traditional polling:

Depth: We can present movements with unprecedented demographic and geographic granularity. The data file includes the combination of two demographic (e.g., 18-24 year olds in CA with a “some college” education). This type of depth will be crucial in online targeted campaigns moving forward.

Cost-Effective: This polling is possible, because it costs less than 1/10 traditional polling.

Accurate: It is also accurate. Our state-of-the-art method uses machine learning to convert whatever polling data we collect into results that are at least as accurate as results from traditional random-digit dialing polls (as shown in real event predictions, and an academic validation study in collaboration with PEW Research).

Speed: While this data does not take advantage of this, national polls can run in under an hour. Very useful if we are trying gauge the quick impact of some event.

We limited this experiment to big companies that (1) whose work is very visible and connected with general population (2) their company is synonymous with their brands. For example, we can only rely on companies that are consumer-facing (it would make little sense to capture purchasing behavior for Boeing, after all when is the last time you bought a 757?) and for which the name is synonymous with or at least visible in the product (most consumers who buy baby diapers do not know that they are buying products from Procter and Gamble). In total, we have been tracking 18 S&P 500 companies that fulfill these requirements: Amazon, Bed, Bath, and Beyond, Best Buy, Costco, CVS, Dollar General, Dollar Tree, Home Depot, Kroger, Lowe’s, McDonald’s, PayPal, Rite Aid, Starbucks, Target, Walgreens, Walmart and Whole Foods (recently acquired by Amazon). We collect survey data on each of the companies in the beginning of each month.

The reason we are introducing this dataset before writing an academic paper or letter the data run too long, is we are interested in sharing it and getting thoughts on outcomes. In politics we are both experts in the outcome space and have access to a lot of the complimentary data. But, finance is a slightly different outcome space, and there is a lot of people (like maybe you?) who have proprietary or dispersed data. So, we can run some toy outcomes, but we need to find partners to really understand how and where this data makes a significantly significant difference.

Having validated the signal of our survey variable (results do hold up in multivariate regressions controlling for the overall size of the S&P 500, and are statistically significant), we can use our data to predict which company will register the biggest relative increase in stock price from June to July.

Prediction_All