This is an addendum to another piece I wrote, in which I attempt to model both pollster herding and sample bias as possible explanations for the significant polling error seen at the 2019 Australian federal election. If you haven’t read that, it’s available here; it’s probably helpful to go through some of the background on what I mean by each (and the model I use) to understand the rest of this piece.
A few definitions:
Pollster herding: That’s where the polling industry, for whatever reason, releases a set of polls which are a lot closer to each other than should be expected. Broadly speaking, even if pollsters used exactly the same methods, questions, weights etc (which they don’t), their polls should still vary within a certain range due to random chance (aka sampling error), e.g. occasionally sampling more Labor supporters than usual. There are a variety of reasons why polls might end up much closer to each other than they should be; ranging from fear of being the sole outlier to get it wrong (which can lose them contracts; Roy Morgan lost their contract with the Bulletin in 2001 after their poll massively missed the Coalition’s re-election) to selective questioning of outliers on one end but not another (Nate Silver’s hypothetical Iowa pollster is a case in point).
Pollster herding can often seem like a very innocent thing from the inside; very often, when you get a really weird result, it might well be due to some error which you should correct for. The problem arises when outliers on one end are examined but not outliers on the other end (or for that matter, only results which conflict with “conventional wisdom” are questioned), which can systematically skew every poll towards one side or another. Herding can actually increase the accuracy of the individual poll (data from my poll herding simulator, which you can download here):
However, by increasing the correlation between polls (such that poll errors don’t “cancel out” as often in averages), poll herding makes poll averages less accurate:
Sample bias: Bias is the statistical term for when a technique (e.g. polls) for measuring something (e.g. voting intention for the Coalition) systematically differs from what the true value is. This does not imply that someone or something is intentionally skewing the polls to get a preferred result.
In polling, pollsters have to find a group of people to ask questions (also known as sampling). Their methods for doing so may systematically over- or under-estimate certain groups of people, which can lead to a sample bias if these groups of people differ from the rest of the population on the characteristic of interest (e.g. if polls reach two men for every woman, and women are more likely to vote Labor, then unweighted polls may under-estimate the Labor vote). Sample bias can often be partially or mostly corrected for by weighting (giving under-represented groups more weight in polls to produce more representative results), but sometimes pollsters either can’t weight for something (e.g. if people who respond to polls and people who don’t respond to polls differ despite sharing identical demographics) or don’t know what they needed to weight by (e.g. failing to weight by education because historically it had little effect on who you voted for).
Side note, because I know this is going to come up whenever people see sample bias:
There is no evidence of a systematic skew against the Coalition, or conservative/nationalist parties more broadly, in Australian polling. (and if there was, why wouldn’t pollsters correct for it?) Of the final polling averages for the elections held in the 2016-2019 cycle which I can find 2-party-preferred figures for, Labor was under-estimated in 2 of 5 (WA and VIC, the latter being a bigger polling error than the 2019 federal polling error). In the two state elections we’ve had polling for since the 2019 Australian federal election (QLD and WA), Labor was under-estimated both times (and the under-estimate in the WA election was slightly bigger than the over-estimate in the 2019 federal election). Looking at other “politically incorrect” issues:
Polling in Queensland (where most pollsters split out One Nation) actually very slightly overestimates support for One Nation, on average (and did so in the 2020 Queensland state election).
Polling in WA has historically somewhat over-estimated support for One Nation at state elections (by about 1%, which is pretty big once you account for the fact that they don’t run in every electorate), and did so again at the recent state election.
Depending on how you average the polls, polling in NSW 2019 either over-estimated the One Nation vote or nailed it to within 0.1% (pollsters usually didn’t break out One Nation from Others in NSW, prior to 2019).
The largest over-estimate of the Coalition vote in recent elections came at an election (2018 VIC) where the Liberals emphasised “African youth gangs” and other social issues. Voters who supposedly lie about their vote for fear of being called bigoted, why hast thou forsaken thine shy Tory theory?
I could go on (and will probably write a piece going through the evidence for this [edit: that’s now up, here]), but, bluntly speaking, there is no evidence for a “shy Tory” effect at all in Australia. Whatever unrepresentativeness in polling samples has historically been accounted for through the standard weights used by pollsters, with little evidence that there is some group of conservatives out there who systematically lie about their voting intention when asked (or refuse to answer the survey).
Of course, the standard weighting and sampling methods used by pollsters don’t always work perfectly; which is why we’re here. In particular, the Association of Market and Social Research Organisations (AMSRO) report on the 2019 Australian polling failure notes that a failure to weight by education may have caused some of the polling error, which we will further examine here.
Weighting by education
By the AMSRO estimates (Table 21 in the above report), weighting by education would significantly reduce the error on the primary votes seen in the 2019 polls. A reminder of what the actual polling looked like:
(if you’re on a mobile device, scroll right for full data or turn your device landscape)
Pollster | Coalition | Labor | Green | PHON | Others | 2pp (Coalition) |
---|---|---|---|---|---|---|
Newspoll | 38 | 37 | 9 | 3 | 13 | 48.5 |
YouGov | 39 | 37 | 9 | 3 | 12 | 49 |
Ipsos | 39 | 33 | 13 | 4 | 11 | 49 |
Essential | 38.5 | 36.2 | 9.1 | 6.6 | 9.6 | 48.5 |
Morgan | 38.5 | 35.5 | 10 | 4 | 12 | 48 |
Average | 38.6 | 35.7 | 10 | 4.1 | 11.5 | 48.6 |
Error | -2.8 | +2.4 | -0.4 | +1 | -0.3 | -2.9 |
As the above table demonstrates, most of the error on the 2pp came from an under-estimate of the Coalition vote and an over-estimate of Labor’s primary vote. The table also demonstrates how unnaturally clustered-together the 2pp estimates were (all of them being between 48-49%; I estimate a less than 2% chance that the final polls would be this clustered or more so if they were independent), which goes to demonstrate the possibility of herding in the 2019 polling.
The AMSRO provides us with two possible estimates of what would change if education was included in weighting polls through two case studies from the Comparative Study of Electoral Systems (CSES) and the Australian Electoral Study (AES). For the purposes of highlighting the potential impact of herding, I’ve opted to use the figures from the CSES here, as they show a greater reduction in error when weighting by education is adopted. Applying the reductions in error cited from the CSES uniformly to the 2019 polling, we get the following:
(if you’re on a mobile device, scroll right for full data or turn your device landscape)
Pollster | Coalition | Labor | Green | PHON | Others | 2pp (Coalition) |
---|---|---|---|---|---|---|
Newspoll | 39.5 | 37 | 8 | 3.5 | 12 | 49.5 |
YouGov | 40.5 | 37 | 8 | 3.5 | 11 | 50 |
Ipsos | 40.5 | 33 | 12 | 4.5 | 10 | 50.5 |
Essential | 40 | 36.2 | 8 | 6.9 | 8.9 | 49.5 |
Morgan | 40 | 35.5 | 9 | 4.5 | 11 | 49.5 |
Average | 40.1 | 35.7 | 9 | 4.6 | 10.6 | 49.8 |
Error | -1.3 | +2.4 | -1.4 | +1.5 | -1.2 | -1.7 |
2pp figures for all other pollsters were estimated using last-election preference flows. All 2pp figures rounded to nearest 0.5% to match our pollsters' reporting conventions.
Weighting by education would thus have reduced the error on the Coalition’s primary while slightly increasing the error on minor parties’ primary votes; this somewhat reduces the error on the 2pp estimates. While not perfect, this set of polls are definitely closer to the final result than the ones we got, with at least one pollster correctly calling the 2pp winner (Ipsos) while one more would have shown a dead heat (YouGov). Off these figures, a Labor victory would have probably have been correctly forecasted to be unlikely, given that the swing (0.6% to Labor) wouldn’t even be enough to knock out a single Coalition marginal on the pre-2019 pendulum.
How herding interacts with sample bias
However, the low level of variance in the polling would still be highly indicative of herding. As I note above, herding means that pollsters are unlikely and/or unwilling to publish outliers; if the pollsters are getting the result broadly right (e.g. if they had Coalition 51-52%) this can actually lead to polls being more accurate than if they didn’t herd.
For example, let’s say that pollsters had gotten a result 51.5% to the Coalition several weeks out, and herded towards that instead of the Labor 53% they were roughly at, in our world. What would herded polls look like under our model of pollster herding?
As I noted in my previous piece, herding only causes problems when pollsters herd towards an incorrect result, either because of genuine shift in voter intention (e.g. maybe voters really intended to vote 53-47 for Labor in early 2019?) or because of herding towards incorrect polls (e.g. maybe a few rogue polls showed a really good result for Labor after the removal of Turnbull, and everyone herded towards them). In a world where voting intention is baked in early on and the early polls get it right, herding actually makes polls more accurate, not less. Interestingly, this seems to suggest the possibility of developing more accurate polling-based models by measuring the amount of volatility (or swing) in the polls in the lead-up to the campaign. I’ll probably have a look at this before the next federal election and incorporate it into a model, if it’s relevant.
Of course, pollsters can’t hope for either when conducting polls. However, now we know that a failure to weight for education likely caused a significant chunk of the 2019 error. Which brings up the question – could pollsters have avoided the 2019 polling failure had they simply weighted by education from the beginning, and continued herding anyway?
Before I go through the results of the simulation, a caveat: we don’t know what the exact sample bias or degree of herding was at the 2019 election. We can approximate both with modelling, but given that no pollster has released their raw data, even to the AMSRO report panel, we can’t know the exact issues with the polls that led them astray in 2019. We also won’t know, ahead of time, what the sample bias is (or the exact degree of herding – we still might be able to approximate it, given enough data) – if we did we could predict the election with perfect accuracy. These analyses are more of an attempt to understand what happened than necessarily predict what will happen, especially since pollsters have changed or will likely change their methods in response to the polling failure.
Correcting for sample bias and its interaction with herding
Here, we assume the best-case scenario for an improvement in polling accuracy when we weight by education (i.e. using the CSES findings mentioned above), and that the changes in polling would have been roughly similar going back a couple of months. From the above table, this implies a shift to the Coalition of about 1.2% on the 2pp; applied to the early 2019 polls, we estimate a starting Coalition 2pp of about 48%. Using our model from the last piece, what might the polls have looked like with weaker Labor vote to herd towards?
Interestingly, we get a distribution (under herding) which pretty closely matches the adjusted polls we found above (marked in green). More importantly, the polls are overall still systematically skewed to Labor, even without a sample bias; as long as the pollsters herd and the vote starts out more friendly to Labor than the final result, they still end up skewing left. This is matched by a histogram of the simulated 2pp in polling averages:
And as in our last piece, the polls will remain under-dispersed if the pollsters herd:
As the above show, even if the best-case scenario for polling accuracy upon weighting by education is applied, pollsters would likely still have over-estimated Labor in their polling had they herded, and done so all the way to election day. Unless the polling several months out is fairly close to the final result – an unlikely event, considering polls only rapidly approach the final result at around the 100-day-from-election mark – herding pollsters will likely be systematically skewed to whichever side is leading further out from the election.
(Alternatively, polls can also herd towards the “conventional wisdom”, whatever that is; but I don’t have a way of quantifying the conventional wisdom yet and hence I’ve left it out of this piece)
What this means, going forward
After the 2019 Australian polling failure, it seems like pollsters have reviewed their methods and weights in order to avoid another polling failure (we know that YouGov has switched to purely-online polling, for example). Furthermore, although this isn’t a formal analysis, it does look like polling since the 2019 election has had a lot more variation between pollsters, with, for example, the odd Essential poll showing Labor ahead while Newspoll and others claim the Coalition is ahead; or, more recently significant variation in the Labor primary in the first half of 2021. This seems to assuage fears that pollsters might be herding towards another polling failure, with the recent announcement of a Code of Conduct by the new Australian Polling Council suggesting an increased transparency in polling which may help to reduce the odds of further herding.
At the same time, Nate Silver suggests that, in the US, pollster herding rapidly increases the closer to an election the poll is released. Again, this isn’t a formal analysis, but eyeballing the trend for the polling leading up to the 2019 Australian election, it does seem like polls are likely to converge as an election approaches (compare the variation in the polling for the May 2017 – Aug 2017 period to the variation in polling a month out from the election; Labor led by similar margins in both but polling for the former had some outliers with the Coalition ahead). If that is the case, then we don’t know whether polls will herd again in the 2021/2022 Australian federal election; the drought of polling since the 2019 polling failure means that we’ve had very few state polls we could use to infer the presence/absence of herding.
What we do know is that, as the above analysis demonstrates, even if polls correct previously-unrepresentative samples, herding can still lead to increased polling error. Given that polls are attempting to model an unknown variable, attempts to “adjust” results perceived to be outliers can get it completely wrong and lead to worse polling than the results were published as-is.