Poll Anchoring and Government Honeymoons

Note: a fair bit more technical than usual, if you want to skip to what voting intention may have looked like with an honeymoon-adjusted left-anchored model in 2016-2019, click here. If you just want to read the summary for what this means for polling in the current (2019 – 2021/2022) term, click here.

My interest was piqued recently while reading Mark Graph’s write-up after the 2019 Australian federal election, in which he describes how poll-anchoring (specifically a left-anchored model) would have produced a more accurate final poll aggregate at the 2019 election than assuming the polls were not skewed to any side overall. However, he also noted how anchored models have, in the past, under-performed models which assumed that the polls would average out to be relatively unbiased; while I have not seen said models, I do have a hypothesis as to why this may be the case and how anchored models might be improved.

But first, some background:

What is anchoring?

Anchoring is basically the practice of using known election results to “anchor” the polls. For example, if an election just happened in which the Coalition won 51% of the 2-party-preferred vote (2pp), and a poll which comes out right after the election suggests the Coalition has 54% of the 2pp, anchoring would assume that the polls are broadly over-estimating the Coalition’s 2pp and adjust it downwards (by varying amounts, depending on the specific model and assumptions used).

Usually anchoring is done using past election results, also known as “left-anchoring”, after the fact that graphs tend to be plotted left-to-right and hence assuming that polls taken right after an election should produce the same vote shares as that election effectively “anchors” the polling average at the left. In other words, poll anchoring basically assumes that polls taken at some point should produce estimated vote shares equal to some known quantity, and if they don’t, it must be because they are systematically skewed in some way (and that this skew is likely to continue until the next anchoring event).

Why hasn’t anchoring worked as a predictive tool?

Well, there are many possible reasons. Usually poll-anchoring (when used as a tool to help forecast elections) anchors current polling to a previous known election result (i.e. left-anchoring) by comparing polling near that election with the election result. However, there may be several problems with this; for example, there is usually a lull in polling conducted right after an election (as media and viewers are less interested in knowing how the parties are traveling when they just had an election to tell them) and hence there may be very few polls, making it harder to properly anchor polls (e.g. if you only have one poll, and it’s out compared to the last election by 2%, is that because the poll is skewed or just due to random chance?).

Furthermore, the relative shortage of polls means that you might end up anchoring to a pollster with a known history of skewing one way or the other e.g. if the only polls you have are from a pollster who has always under-estimated the Labor vote, your model might end up artificially inflating the Labor vote for all pollsters in response. On the other hand waiting for too long for a larger sample to anchor to might also mess with anchoring as voting intention might change and you might end up anchoring to the wrong result.

Most importantly, poll-anchoring assumes that the voting-intention you would get if you managed to get an honest response out of every voter on the day (or week, or month) after the election should be the same as the election result. However, what if there was a systematic shift in between an election result and how people would vote the day after an election?

To examine this, I’ve taken two-week averages of a newly elected government’s 2pp in polling and compared it to the 2pp they were elected with. So for example, if polls taken 2 weeks after the 2019 federal election (Coalition 51.53%) averaged out to 53%, then the bias would be 53% – 51.53% = +1.47%. Plotting these averages against the mid-points of each two-week period:

(the database of historical Australian federal voting-intention polls is available upon request. I will probably make it fully public at some point, but I do need to add more polls, clean it up and ensure everything is properly referenced first)

(Government) honeymoons in Australia

Time-series scatterplot of 2pp difference versus days from last election

Keep in mind that these are averages and that there tends to be a lot of variation in how well different governments are polling at the same point (a graph of the actual polls would look like this instead). Still, it seems pretty clear that the average government receives a honeymoon bounce of about +1% (in 2pp terms) in the weeks following the election. If one anchored polls to the last election result, this would systematically stuff up the model as the polling usually used for left-anchoring – the ones conducted shortly after the election – will tend to suggest a government vote higher than at the last election; adjusting this away will mean that the model will tend to under-estimate the government come election day.

However, this information is only useful if we can estimate the “true” size of the honeymoon bounce and if it is broadly similar between governments. If every honeymoon bounce is very different, then there is no way for poll-anchoring to improve on simply assuming the polls are correct as there would be no way to tell if an abnormally large honeymoon bounce (e.g. Rudd 2007) was due to polls over-estimating the government or an actual surge in voting intention for the government.

One way to tell whether honeymoon bounces are roughly similar is to compare the size of the honeymoon bounce to the final 2pp error at the next election. If, in a given government’s term, polls which estimate a larger honeymoon bounce also over-estimate the government on election day, then that suggests that at least part of the large honeymoon bounce was probably an over-estimate produced by a systematic skew to the government. On the other hand, if polls which show no honeymoon bounce (or a decline in voting intention for the government) are also more likely to under-estimate the government come election day, then that suggests that the government probably received a honeymoon bounce which was not picked up by polls due to a skew against the government.

If both are true, it suggests differences between honeymoon bounces are at least partly due to systematic skews in the polls (rather than being wholly or mostly due to actual shifts in voting intention) and therefore the size of the honeymoon bounce in polling can be used to adjust polls through left-anchoring. So, are they?

Scatterplot of honeymoon bounces versus final polling errors — R² = 0.4, p < 0.05

While this is a fairly small sample size (n = 12), the relationship between honeymoon bounces and the final polling error seems fairly strong (and is statistically significant). This means that, for most elections, somewhat discounting the honeymoon bounce and anchoring the polls to the last election result + honeymoon bounce should improve polling accuracy as compared to assuming the polls are not skewed on average.

I’ve come up with a relatively simple method for left-anchoring polls with a honeymoon adjustment (detailed in the accordion below). It’s a fairly quick and dirty method – in particular its estimation of house effects is probably less accurate than Mark’s Bayesian methods – but it’ll work fine as a comparison to a simple model which assumes the polls are accurate (unbiased) on average.

Anchored model methodology

Firstly, I ran a simple linear regression of honeymoon bounce size (calculated by averaging all polls taken up till 14 days after the last election) versus the final polling average error. The data available for this regression was limited to only data available prior to the election in question; so for example if I wanted to model voting-intention in the 2004-2007 term, only data from 2004 and prior was allowed in the regression.

Next, I ran an unadjusted aggregation on the polls using LOESS (span = 0.2). Using this, I calculated house effects for each pollster by averaging the differences between the published 2pp and the 2pp trend generated by the LOESS aggregation.

Using the honeymoon size vs final error model, I calculated a predicted skew on election day. For example, in 2016, given the non-existent honeymoon of the Abbott government, my model expected the final polls to be skewed against the government by 0.5%; hence the predicted skew would be 0.5% to Labor.

I then assumed the house effects for all pollsters should average out to this predicted skew (pollsters who either conducted very few polls (5 or fewer) and/or stopped polling midway were not included in this assumption). If they didn’t, they were adjusted such that they were – so for example, in 2016, the house effects of the pollsters averaged out at +0.1% to the Coalition; my model would then subtract 0.6% from every pollster’s house effects so they averaged to -0.6% instead.

Each poll was then adjusted by subtracting the house effect from the published 2pp, and a LOESS trendline (span = 0.2) was run through the adjusted polls. To get the final anchored average, I calculated the simple average of the adjusted polls instead of using the final reading from the adjusted LOESS trendline. While this makes the model’s adjusted 2pp outputs slightly less accurate I felt it was more fair to compare a simple, unadjusted average to a simple adjusted average rather than comparing it to a smoothed aggregate.

As I said above, it’s a fairly quick and dirty model and probably would need some work before it could be used for predictive purposes. However, for the purposes of general comparison, it should work fine (and for back-testing on old elections, I doubt it would differ very much from more sophisticated methods e.g. Bayesian inference).

You can download the raw poll data and adjusted poll data here.

Left-anchored, honeymoon-adjusted models on historical elections

I’ve run my model on each federal election going back to the 2004 – 2007 term (i.e. the final Howard term).¹^x Backtesting further (into the 2001 and 2004 elections) may not be a good idea due to older polling’s use of the less-accurate respondent-allocated method of estimating 2pp from first-preference votes (so I’d have to decide whether to use pollsters’ published 2pp figures or to estimate my own using last-election preference flows). Below is a table of the Coalition 2pp in final polling average for each election as well as the final reading from my left-anchored, honeymoon-adjusted model:

Election	Polling average (%)	Anchored model (%)	Result
2007	46.1	46.9	47.3
2010	48.5	50.4	49.9
2013	53.2	53.4	53.5
2016	50.6	51.1	50.4
2019	48.6	50.6	51.5
Average error	1.2	0.5

Overall, it does seem like left-anchoring models with a honeymoon adjustment does seem to produce more accurate results. Going through each federal term (purple diamonds = election results, solid line = anchored trendline, translucent grey dots = polls):

2004 – 2007 federal term: Howard’s last hurrah

Honeymoon-adjusted, left-anchored polling for the 2004-2007 term

Starting off, polls conducted right after the election showed very little evidence of a honeymoon for the re-elected Howard government, which our model takes as evidence that the polls likely under-estimated the government and thus adjusts towards the Coalition (notice how the black trendline is usually slightly above the polls). This turns out to be somewhat correct – of the three final polls taken a week out from the 2007 federal election, one slightly over-estimated the government (Newspoll), one slightly under-estimated the government (Morgan) and one massively under-estimated the government (Nielsen). The final average from my model has the Coalition government at 46.9%, whereas a simple average of the polls would have gotten 46.1% (the Coalition ended up winning 47.3%).

This is one of those elections where left-anchoring the polls probably wouldn’t have made much difference as the polls taken right after the 2004 election are fairly similar to the election result (hovering around the 51 – 54% Coalition mark). Interestingly although WorkChoices is often cited as the downfall of the Coalition government, it’s hard to see its effects in the polling here; it was introduced in mid-2005 and came into effect early 2006, neither of which look like particularly bad periods for the Coalition (being at worst slightly behind). It was only with Kevin Rudd’s ascension to the leadership in end 2006 that the Coalition vote falls apart in polling; even then they manage to recover quite a bit, ending up around 46% on election day.

2007 – 2010 federal term: Kevin Rudd and the Federal Honeymoon

Honeymoon-adjusted, left-anchored polling for the 2007-2010 term

I’m going to guess that this is one of the elections (or the sort of the election) Mark Graph refers to when he observes that “a number of aggregators in past election cycles (used) an anchored model and (ended) up with worse predictions”. Instead of the approximately 53% won by Labor at the 2007 election, Rudd’s government starts off with the mother of all honeymoon bounces as they watch their 2pp run into the 60s against an opposition in disarray. Visually, you can see how left-anchoring the polls to the 2007 election result in this case would have produced complete nonsense; a model which had done so would probably have predicted the Coalition to win over 53% of the 2pp at the 2010 election instead of the very narrow loss which occurred instead.

Fortunately, with data such as Hawke’s 1984 honeymoon and subsequent under-performance in its training set, the model only partially discounts the Rudd honeymoon (although you can still pretty clearly see how the trendline is a few points more favourable to the Coalition than the polls). The final average from my model has the Coalition narrowly ahead on the 2pp (50.6%), as compared to the final polls which had Labor ahead instead (Coalition 48.5%). In that sense, my model would have “called” the 2pp winner wrongly (final Coalition result in 2010, 49.88%) and would probably have expected a hung parliament or a very narrow Coalition win despite being slightly closer to the 2pp result than the final polls.

2010 – 2013 federal term: Abbott’s Rise

Honeymoon-adjusted, left-anchored polling for the 2010-2013 term

The Gillard government starts out with a small honeymoon bounce consistent with historical expectations and hence there’s fairly little difference between my model’s outputs and a simple, unadjusted polling average. An average of the Coalition 2pp in the final polls would come out at around 53.2% while my model estimates 53.4%; with the final result in the 2013 election being 53.49%.

One thing I’ve noticed is that my trendline does tend to smooth out leadership-change-related shifts in the 2pp as being more gradual than they actually were; this isn’t anything to do with my model but rather the LOESS regression I use to generate the trendline (the model just takes polls and adjusts them; in theory I could use an unsmoothed rolling average but felt the trendline was better at discounting noise). For the purposes of demonstration this is probably fine; although if I had to do a prediction or statistical aggregation I would probably take a page out of Poll Bludger/Mark Graph’s books and introduce a discontinuity every time there is a change in the Labor/Liberal leadership (i.e. break the trendline between the old/new leaders).

2013 – 2016 federal term: After All, Why Shouldn’t I Spill It

Honeymoon-adjusted, left-anchored polling for the 2013-2016 term

Oddly enough, despite the Abbott government not getting much of a honeymoon bounce, my model does not adjust the polls towards the Coalition by much (the difference is just half a point). It does seem like data up to that point (i.e. data from the 2013 election and earlier) suggests only a small 2pp over-performance by a government with little in the way of a honeymoon bounce.

The Abbott government starts out fairly strong, although it begins to slide into negative territory and rapidly bottoms out by the new year. The 2pp continues to hover in the 45% – 49% range until Turnbull challenges in September 2015, at which point the Coalition surges to 52% – 56% before slowly trailing off and finishing up at a narrow win. My model would have predicted a Coalition win with 51.1% of the 2pp, while the final polls are half a point closer at 50.6% to the final result of 50.36%.

2016 – 2019 federal term: The Supposedly Unloseable Election

Honeymoon-adjusted, left-anchored polling for the 2016-2019 term

The big one everyone wants explained – the polling failure of 2019. Here, we have the Turnbull government starting off by immediately falling behind in the 2pp, which my model reacts to aggressively by assuming the polls are systematically under-estimating the Coalition. As a result, the Coalition 2pp trendline runs above pretty much every poll all the way up till election day, finishing up at around 50.6% as compared to the 48.6% averaged in the final polls (final result 51.53%). As the Coalition barely won a majority at the last election with 50.36% of the 2pp, a 2pp-to-seats conversion would probably expect a close contest with the most likely outcome being the return of the Morrison government.

One interesting effect I noticed in the house effects generated by the model is that of all the pollsters, YouGov/Fifty Acres has the smallest estimated house effect (i.e. it may have been the most accurate pollster of the 2016-2019 term, although without an election there is no way to be sure), which seems to match up with what Mark found when he attempted to anchor his model to YouGov’s result. I will note, though, that this treats the YouGov/Fifty Acres published 2pps as given when in fact the published primary votes would have resulted in the Coalition losing the 2pp even if YouGov/Fifty Acres had gotten the preference flows exactly correct.

Inferring voting intention in the 2016 – 2019 term

Apart from poll aggregation and forecasting election results, anchored models can also be used after an election result is known to infer what voting intention may have looked like all along. (Mark Graph has three such models here, while Kevin Bonham has some right-anchored models here)

Including the 2pp for Labor, we get a graph that looks something like this:

Estimated voting intention in the 2016-2019 term — Shaded area represents the 50% and 95% CI for a poll taken assuming the above 2pp trend is correct. CIs were constructed using the historical accuracy of polls taken in the final week of the election (estimated stdev = 1.93%)

Note that the model is not right-anchored; there’s still about a point of 2pp error which isn’t explained by this model (reflected in the gap between the final blue line and the 2019 result as represented by the blue diamond on the right). However, the model does end up quite a bit closer to the final result (2% closer), which suggests the possibility that a large chunk of the error in the 2019 polls may have been baked in from the start with some extra error accumulating at some point over the course of the term.

This is what the graph looks like if I include the 2pp estimates published by pollsters over the course of the term:

Estimated voting intention in the 2016-2019 term, including published polls — Shaded area represents the 50% and 95% CI for a poll taken assuming the above 2pp trend is correct. CIs were constructed using the historical accuracy of polls taken in the final week of the election (estimated stdev = 1.93%)

Interestingly, this suggests that even assuming that voting intention was off throughout the 2016-2019 term, most polls fell within what might be termed the historical margin of error, or the margin of error calculated from polling errors prior to 2019 on the assumption polls would stay as accurate as they previously were. This reflects something interesting – while the polls as a whole were probably more off in 2019 than they have been historically, the individual poll error was not an outlier by historical standards.

As a quick example, let’s say every poll got the 2019 election as wrong as they did in real life, but instead of all five pollsters over-estimating Labor, three over-estimated Labor and two over-estimated the Coalition:

Pollster	Actual 2pp, 2019	Randomly-erring 2pp
Morgan	48%	55%
Essential	48.5%	48.5%
Ipsos	49%	49%
YouGov/Galaxy	49%	49%
Newspoll	48.5%	54.5%
Average	48.6%	51.2%

Figures given in Coalition 2pp. The actual result was 51.53%.

Even if the size of the error on each polls had been identical, it wouldn’t have mattered in an average as long as some had over-estimated the Coalition and some over-estimated Labor. The problem in 2019 wasn’t that the polls were massively inaccurate by historical standards (all would have fell within the historical margin of error on published 2pps, +/- 3.8%); the problem was that everybody’s polls failed in the same direction whereas at past elections big errors in one direction usually got cancelled out to various degrees by errors in the other direction.

Examples of the latter include the 2001 election, (where Morgan’s massive 4.5% under-estimate of the Howard government²^x Part of this was due to Morgan using respondent-allocated preference flows which turned out to be faulty; however using the last-elections preference flow method to estimate 2pp figures I still find they would under-estimate the Coalition 2pp by 3%, comparable to the errors seen at the 2019 election. was partly cancelled out by Newspoll’s 2% over-estimate of the Coalition) or the 2007 election (where Nielsen’s 4% over-estimate of Labor was partly cancelled out by Newspoll and Galaxy under-estimating Labor by just under a point).

So what does this mean for the next election (2021/2022)?

Honestly, not much.

Remember when I said above that left-anchoring can be challenging as “there is usually a lull in polling conducted right after an election”? Yeah. After the 2019 election, pretty much every pollster disappeared from voting-intention polling for a good two months, meaning that the earliest poll available dropped 70 days after the 2019 federal election (note that that report is out of date, and YouGov has adopted methodology changes to account for the 2019 polling error). This means that the method I use above – anchoring the polls using polls taken up to 14 days after the last election – simply cannot be used here.

As an alternative, I can attempt to model the relationship between the government’s polling bounce at approximately 70 days and the final error. However, doing so demonstrates that the relationship is fairly weak:

Scatterplot of honeymoon bounce at two months vs final polling error — R² = 0.18, p > 0.1

As compared to the honeymoon bounce at 2 weeks, using the polls from two months out explains about half as much of the variance in the final polling error (and is not statistically significant). This makes intuitive sense – whether the government is still sitting on a honeymoon bounce two months out from an election is probably much more dependent on the particular circumstances of those two months (e.g. did the government do something popular/unpopular, are there any new scandals on the govt/opposition side, is the government/opposition in disarray). This means that it is much harder for any model to anchor polls, as the size of the honeymoon bounce two months after an election will likely be affected more by actual changes in voting-intention than systematic skews in polling.

For what it’s worth, however, given a honeymoon bounce of +1.5% (Coalition 2pp in first Newspoll of 53% versus 2019 election result of 51.5%), the regression above expects the final polling average at the next election to under-estimate the Coalition by 0.04% (with a margin of error of +/- 3.3%). In other words, it expects very little skew, although the margin of error on that expectation is so large that for all practical purposes it might be best to simply take the polls at face value instead of attempting any kind of adjustment (especially considering the Newspoll released then was produced using different methods from Newspolls 2020 and later).

Anchoring, Honeymoons, and Voting Intention in the 2016 – 2019 Term