Do Outlier Polls Tend To Get It Right?

Over the past few months, some pollsters have begun to publish very different voting-intention figures from the rest of the industry. Historically, have such outliers ended up closer to the mark than the consensus?

Resolve, the Nine News pollster/Shows a high Indy vote

(this has absolutely nothing to do with me realising that “Resolve” is a perfect substitute for “Rudolph”)

Throughout 2021 (or 12021 for you Human Era fans), four pollsters have published federal voting-intention estimates with varying degrees of regularity:

Newspoll: Administered by the international pollster YouGov, and commissioned by News Corp. Respondents are wholly selected from its online panel and responses are weighted by age, education, gender, location/region and household income. Provides disclosures on its methodology as YouGov is a member of the Australian Polling Council.
Essential Report: Administered by the pollster Essential Research, often reported in The Guardian (though no commissioning source). Respondents are wholly selected from a provided online panel and responses are weighted by age, gender, location and party ID. Provides disclosure on methodology as Essential is also a member of the Australian Polling Council.
Morgan Poll: Administered by the pollster Roy Morgan, with no commissioning source. Respondents are apparently sourced through telephone and online interviews, with no information on the proportions for each or the weighting frames used. No methodology disclosure beyond whatever Roy Morgan feels like including in its findings.
Resolve Political Monitor: Administered by the new polling firm Resolve Strategic, and commissioned by the Nine newspapers (Sydney Morning Herald/Age). Respondents are apparently sourced from online panels (with some polls including a telephone component) using quota sampling. Quotas and weights for age, gender and area are apparently employed, as well as “other demographic or lifestyle attributes”.

For most of the year, these polls showed a broadly similar picture on the Labor-vs-Coalition contest. However, in the last 4 months, Resolve has diverged from the other polls in painting a worse picture for Labor:

Polling for Labor's first-preference vote intention over time, split up by pollster. — I use a LOESS curve above to show the general trend for each poll series, which may smooth out the suddenness of Resolve’s divergence.

In that same timeframe, Resolve has started to show a broadly higher Coalition voting-intention as compared to other pollsters:

Polling for the Coalition's first-preference vote intention over time, split up by pollster.

This has resulted in Resolve being the only pollster suggesting that the Coalition is ahead on a 2-party-preferred basis:

Estimated two-party-preferred voting-intention over time, split up by pollster. — With the exception of the November 2021 Resolve Political Monitor. Two-party-preferred estimates calculated using last-election preference flows, treating the Independent option in Resolve polls as if it were part of Others (see here for an example of how it’s done)

But this is quite peculiar/If you look at other polls

Beyond that, however, Resolve has published extremely high estimates of voting-intention for Independents (between 7-9%) as well as unusually low estimates of voting-intention for all parties other than Labor/Coalition/Greens/One Nation. This has resulted in its combined Others voting intention being higher than other pollsters:

Polling for the all other parties + independents vote intention over time, split up by pollster. — Of note, Essential is also reporting unusually low Others voting intention relative to other pollsters.

While we don’t know much about Resolve’s methodology, its unusually high estimates of Independent voting-intention most likely boil down to 1) refusing to give respondents an “Undecided” option and 2) giving all respondents a generic “Independent” option. Independents don’t always run in every seat, and the ones that do bring very different policies, experience, and reputations to the table – a left-wing voter may vote for a socialist Independent while a right-wing voter may vote for a conservative Independent – but neither would vote for the other’s independent.

Apart from Resolve, Morgan has also continued to produce higher estimates of Green voting intention than other pollsters:

Polling for the Greens' first-preference vote intention over time, split up by pollster.

With all that in mind…

Then one night, Christmas Eve/Ethan came to say/”Do outliers tend to get it right?/Or does the consensus hit the bullseye?”

Obviously, we can’t know ahead of time whether Resolve’s methods are a better way of estimating voting-intention (I’d even argue sampling error makes it hard for us to know after an election). But let’s have a look at historical federal elections to see whether polls with significant house effects (show a different result versus other pollsters) usually end up closer to the result. For this, I examined federal polls only (historical state polls are a little harder to come by) and compared the error on outlier polls to the error on the ‘polling consensus’:

Full methodology

Only federal polling was used, as many pollsters archive their federal voting-intention polling but not their state voting-intention polls.
Only elections with three or more final polls (last poll conducted by that pollster + must be conducted within 7 days of the election) were included.
Polls were then recalculated such that all primary-vote figures added to 100%.
For the purposes of this analysis, only the primary vote for the Labor, Coalition, Greens (2001 and later) and all Other parties/groupings were considered. Polls could be classified as outliers in any of these voting-intention estimates.
A published voting-intention estimate was categorised as an “outlier” if it was both:
1. More than 1.5x the Inter-Quartile Range (IQR) of all polling – the “classical” definition of an outlier
2. More than 2% out compared to the median voting-intention estimate for all other polls
The second criterion was included to avoid situations where two or three polls produce the same estimate (e.g. 3%) and another poll produces a slightly different estimate (e.g. 4%). The last poll might be classified as an outlier under criterion 1) but would be excluded by criterion
Poll errors were then calculated from 1) the outlier poll, 2) a simple average excluding the outlier estimate and 3) a simple average of all polling at that election.

Election	Pollster	Outlier in:	Outlier poll error	"Consensus" error	Average error on individual 'consensus' polls
1993	Nielsen	L/NC	3.7	0.7	± 0.7
2001	Morgan	ALP	5.7	0.5	± 0.5
2001	Morgan	L/NC	-4.4	3.1	± 3.1
2004	Morgan	Greens	2.3	-0.2	± 1.5
2007	Nielsen	ALP	4.6	-0.1	± 0.5
2007	Nielsen	L/NC	-2.1	0.2	± 0.6
2007	Nielsen	Others	-2.7	-0.4	± 0.6
2010	Newspoll	ALP	-1.8	0.7	± 0.7
2013	Nielsen	Others	-2.4	-0.5	± 0.6
2013	Morgan	Others	1.6	-0.5	± 0.6
2016	Ipsos	L/NC	-2	0.6	± 1
2016	Ipsos	Greens	2.8	0.4	± 0.6
2019	Essential	Others	-2.1	0.3	± 0.7
2019	Ipsos	ALP	-0.3	3.1	± 3.1
2019	Ipsos	Greens	2.6	-1.1	± 1.1
	Average		2.8	0.8	1.1

See above for definition of "outlier". "Consensus" refers to all polls, excluding the outlier poll(s). In the error columns, positive values refer to an over-estimate of voting intention while negative values refer to an under-estimate.

The most important take-away here is that when a poll clearly diverges from its brethren, historically, it tends to experience massive errors (for context, the 2-party-preferred error in 2019 was about 3%). Of the fifteen outliers identified, there is just one (Ipsos 2019, ALP vote) where the outlier out-performed the consensus – not that it helped, given how much Ipsos over-estimated the Green vote.

This advantage is maintained even when comparing the average error on outlier polls to the average error on individual “consensus” polls – i.e. the error on each “consensus” poll rather than the error on an average of all “consensus” polling.

However, this does not imply that outlier polls should be ignored

I’ve mentioned this before, but predictors with larger error sizes aren’t necessarily “useless”. Outlier polls can still be useful in forecasting the election if they:¹^x The third way predictors can be useful mentioned in that piece (being a leading predictor) does not apply here, as all polls analysed are the final polls for their respective elections.

Correct the bias of the consensus. If the consensus skews one way but outlier polls tend to skew the opposite way, averaging the two may produce an unbiased picture of voting-intention. Unfortunately with just 15 outliers federally, it’s not possible to calculate whether or not the “consensus” tends to skew one way. It’s also worth noting that Australian polls in general show no systematic bias.
Tend to have errors in the opposite direction from the consensus. If outliers tend to under-estimate party X when the consensus over-estimates them, and over-estimate X when the consensus under-estimates them, they can be averaged with the consensus to produce a more accurate predictor.

The evidence for outliers erring in the opposite direction from the consensus is…mixed. There is indeed a very weak negative correlation between outlier errors and consensus errors (though it is not statistically-significant):

Consensus polling error plotted against outlier polls' errors. — “Consensus” refers to an average of all polls excluding those deemed “outliers”. The grey shaded area represents the prediction interval for the fitted line in solid black.

Similarly, including outliers in the polling average does not appear to significantly affect the polling average’s accuracy:

Election	Party/grouping	Avg err, excl. outliers	Avg err, incl. outliers
1993	L/NC	0.7	1.7
2001	ALP	0.5	2.2
2001	L/NC	3.1	0.6
2004	Greens	-0.2	0.4
2007	ALP	-0.1	1.1
2007	L/NC	0.2	-0.4
2007	Others	-0.4	-1
2010	ALP	0.7	0.1
2013	Others	-0.5	-0.5
2016	ALP	0.3	-0.1
2016	L/NC	0.6	0.1
2016	Greens	0.4	0.8
2016	Others	-1	-0.6
2019	ALP	3.1	2.4
2019	Greens	-1.1	-0.4
2019	Others	0.3	0.5
Average		0.83	0.81

In the error columns, positive values refer to an over-estimate of voting intention while negative values refer to an under-estimate.

While there are some elections in which including outliers improved the polling average’s accuracy (e.g. the Coalition vote in 2001), there are others where including outliers made a polling error worse (e.g. the Coalition vote in 1993). Hence, the evidence on excluding or adjusting away outliers is unclear at best.

So what does this mean for polls in the lead-up to the next federal election?

For the most part, adjusting away outliers in voting-intention polling doesn’t appear to have a huge impact, at least in federal elections. My suggestion – treat outliers like any other poll: average with everyone else and ignore the published two-party-preferred estimates (especially Morgan’s) in favour of last-election-preference estimates. While outliers tend have higher errors than the polling consensus, their errors may be in the opposite direction from everyone else’s; averaging everything together may paint a more accurate picture of voting-intention.

Resolve, the Nine News pollster/Shows a high Indy vote

But this is quite peculiar/If you look at other polls

Then one night, Christmas Eve/Ethan came to say/”Do outliers tend to get it right?/Or does the consensus hit the bullseye?”

However, this does not imply that outlier polls should be ignored

Add Your Comment Cancel reply