Why were election polls so far off in 2020?
A likely culprit is "non-response bias" in the polling data driven by people I call "poll trolls". These trolls didn't respond to the poll to troll the system and were therefore not fully accounted for in the data. Let me explain.1/9
A likely culprit is "non-response bias" in the polling data driven by people I call "poll trolls". These trolls didn't respond to the poll to troll the system and were therefore not fully accounted for in the data. Let me explain.1/9
Pollsters know that not everyone will respond to their poll. People are busy and distracted. So they use a statistical technique called "reweighting" to adjust for that. Basically, they use people who responded to "stand in" for similar people who didn't respond. 2/9
For example, they might double count responses from some white male college-educated boomer voters to stand in for white male college-educated boomer voters that didn't respond to the survey. 3/9
Reweighting can work if every type of voter has stand-ins among survey respondents. But it can't do anything to adjust for the types who rarely or never respond to the poll. Non-troll responders standing in for poll trolls won't fully address the bias. And that's the problem.4/9
It doesn't matter who is running the polls - whether it's Fox News or a well-known university - all polling is part "the system" these trolls want to mess with. Some of them may lie to the pollsters, but non-response can cause just as much bias. 5/9
This is one reason why we need to look for other data sources on poll trolls to predict overall voting patterns. How people click, share and comment when they're less aware of being watched may be a much better predictor of behavior for poll trolls. 6/9
@KevinRoose suggested that Facebook engagement data works well to capture trends among poll trolls that traditional polling misses. But we still need to figure out the best way to statistically adjust polling data based on it. (Machine learning helps.)7/9 https://twitter.com/kevinroose/status/1323900526816227330
Many pollsters are already using sophisticated methods to incorporate this type of info into their polling strategy. But they need to refine their approaches using better machine learning tools or reweighting methods that adapt over time as the number of poll trolls changes. 8/9
@FiveThirtyEight has an excellent rundown of how polling has adapted since 2016. But pollsters clearly have more to learn. 9/9 https://fivethirtyeight.com/features/what-pollsters-have-changed-since-2016-and-what-still-worries-them-about-2020/
As an addendum, in case it isn't clear from my bio, my economic and health policy research uses reweighting and machine learning to address different types of sample selection bias, and I teach graduate courses in statistics. AMA!