How Meridiem 2021 performed

If you haven’t read the first part of our post-mortem, that’s over here. We’ve also had a look at what factors explained the swing to Labor in the 2021 WA state election.

With the final 2-party-preferred (2pp) figures for the 2021 Western Australia state election available, I’d like to analyse how well our seat-by-seat vote model in Meridiem performed. As I note in Part I, in an environment where one party is ahead by a massive margin (a la the 69.7% 2pp Labor recorded at the state election, smashing all previous state election landslide records), some of our models (e.g. the candidate effects model) will have a lot less impact on a party’s probability of winning a seat, when compared to a uniform swing model. The vote predictions output by said models usually don’t differ by much (in absolute terms) compared to a uniform swing model – for example, a popular local MP is probably worth about 1 – 3% on the 2pp.

In an “average” election (where one side is usually ahead by not much more than 54-46) where such seats are usually fairly close, that 1 – 3% can be worth a lot in terms of improving their chances of holding their seat.¹^x For example, in an election where statewide polling suggests that the incumbernt Party X is ahead 51-49 in a seat, they have an approximately 59% chance of holding that seat (based on historical swings).

However, if their sitting MP is really popular and Party X’s vote in that seat improves by 3% thanks to their MP’s personal vote, then Party X has an approximately 82% chance of holding the seat.

On the other hand, in a landslide election, such seats tend to be either definitively won or lost anyway (in WA, think seats like Joondalup or Burns Beach), so small shifts in the vote thanks to an MP’s personal vote (the share of voters who are specifically voting for that MP only, and not their party) are not going to make much difference to their party’s chances of winning.

This is why I find it can be useful to analyse the average error on the predicted 2pp for each district by the various models (in addition to probabilistic scoring I used in the last piece), to see how each model performs when it comes to producing an expected 2pp. Comparing the average 2pp error from the Meridiem forecast to the Basic forecast:

(if you’re on a mobile device, scroll right for full data or turn your device landscape)

	Average 2pp error
Basic	4.35%
Meridiem	4.03%
% difference	-0.32%

Data for the 2pp outputs and error available here. The difference between the error on Meridiem's forecasts and the error on the Basic forecast is statistically significant, at p = 0.032.

As the above table notes, the improvement on the error by the Meridiem forecast isn’t very large. Part of this is the fact that the above calculations used the version of Meridiem with the bugs I noted in Part I, and part of it is due to the polling error (the final Newspoll was about 3.7% off on the 2pp), but a large chunk of it is also due to simple randomness.

To put it simply, if there was an obvious and simple model which was capable of consistently and significantly outperforming the simple models we use as a baseline (in this case, uniform swing), it’s very unlikely that no one else would have spotted the underlying patterns used to generate for such a model and either produced such a model themselves or corrected for it (a good example of the latter is how “Labour tends to underperform their polls” was historically a good heuristic in the UK, until pollsters attempted to adjust for said bias and ended up overestimating the Conservatives instead). On the other hand, a very complex model may perfectly fit all the data-points in a given sample but end up much worse at actually making predictions of future events (in statistics, we refer to such a model as “overfitted”), which is why I tend to favour relatively simple models, especially in a notoriously random and complex field such as politics.

This could plausibly lead to us missing some factor which could be used to improve our forecasts, or holding off on including said factor until we gain more evidence. However, given the sheer variety of potential variables we could cherry-pick from to build our model (finding good explanatory variables in election modelling is not so much finding a needle in a haystack as it is finding a steel needle in a haystack made from iron wool), it’s important to be very selective about which variables we use and ensure that they have both a solid theoretical grounding as well as demonstrating actual improved performance over models which don’t contain said variables.

Analysing how various versions of Meridiem with certain components turned off fared compared to a version of Meridiem with all components activated:

(if you’re on a mobile device, scroll right for full data or turn your device landscape)

	Average 2pp error
Meridiem	4.04%
Meridiem (Lean module disabled)	4.14%
Meridiem (uniform swing)	4.39%
Meridiem (no personal vote effects)	4.51%

Data for the 2pp outputs and error available here.

The above data seems to confirm the probabilistic scoring in Part I – each module had the effect of improving the forecast, with some modules apparently being more useful than others. In particular, I want to analyse two modules – elasticity and candidate effects – because I think there could be room for improvement:

Modelling differences in electorate elasticity

(As noted in Part I, there was a bug in the elasticity model which meant that the elasticity in Churchlands was very likely underestimated. I’m not particularly interested in that as I think attempting to improve how the models work overall – instead of purely bugfixing when they don’t do what they’re meant to – is more interesting especially for anyone else who is curious about election modelling)

Another way to measure the performance of the elasticity module is to analyse how the swings in each electorate compared to what would be predicted solely from the estimated elasticity in each electorate. Since the purpose of the elasticity model is to tell us which seats are more likely to deliver above-average swings, we can simply graph the predicted difference between the statewide swing and the swing in each seat to the actual difference:

Predicted deviation from statewide swing vs actual deviation from statewide swing — Negative values mean that the seat swung more to Labor than expected, while positive values mean that the seat swung more to the Liberals/Nationals than expected. The dashed black line represents what we would expect to see under uniform swing. R² = 0.03, p = 0.18

The average error on a pure uniform-swing model would be about 3.82%, while the error on the elasticity-only model is about 3.76%.

Wait – that’s a very tiny difference! The R² is miniscule! It’s not statistically significant!

Yes, all of the above criticisms are technically true. However, the above statistics (p-value especially) are only calculated on a sample size of this election. In back-testing, using an elasticity estimate produced using data only available prior to the relevant election (e.g. if I wanted to test if an elasticity model would have worked for the 2008 WA state election, I would only allow it to “see” data from the 2005 WA state election and older) produces a very consistent improvement of about 0.04% to 0.10% on the 2pp predictions. Normally, that alone would not be adequate proof of a model’s significance – you might just be overfitting a relationship to past data, which then falls apart when applied to future elections – but in this case, given how consistent the performance improvement is in predicting out-of-sample data, and the fact that it appears in other Australian elections (I’ve tested it on the Queensland state elections, and a much simpler model on the Australian federal elections), I think it’s reasonable to say that there genuinely is some sort of elasticity in Australian electorates, and that said elasticity remains somewhat constant between elections (though the true elasticity is hard to calculate due to other factors).

(That’s not to mention the extreme inelasticity or elasticity of certain electorates. A good example of the former at the federal level is the ACT, whose 2pp barely budges from election to election no matter what the nationwide vote looks like, while Queensland appears to be a fairly elastic state at the federal level, tending to swing much harder than the rest of the nation.)

And yes, the improvement afforded by the elasticity model is very small. At the same time, given the evidence, I think it’s useful to attempt to model and estimate the elasticity of each electorate and include that in producing 2pp forecasts for each electorate.

I will also note that in the above graph, some of the seemingly-outliers are actually confounded, especially the group around the middle-left of the graph. Many of those electorates are usually-elastic Labor electorates whose members retired before the election;²^x One of the exceptions is the point right above the dashed line, at about -2 on the horizontal axis. That electorate is Geraldton, whose member defected from the Liberal party to the National party and brought a chunk of his voters with him.

Because Liberal voters tend to preference National candidates at a much higher rate than the inverse (in a contest against Labor, about 85% of Liberal votes usually end up with the National, while 75% of National voters usually preference the Liberals in WA), Labor tends to do worse in a Labor vs National contest as compared to a Labor vs Liberal contest.

This is obviously not something an elasticity model could account for; but in our simulations, we found that the net gain by the conservative side was about 1% on the 2pp when the contest ended up as Labor vs National compared to when the contest ended up as Labor vs Liberal. Adjusting for this would bring the result more into line with the prediction made by the elasticity model. as I note elsewhere, incumbent retirements were highly correlated with a party getting less of the vote in an electorate than they do statewide. If we adjust for retirements, and other personal-vote effects, the predicted swing becomes much closer to the actual swing:

Predicted deviation from statewide swing vs actual deviation from statewide swing adjusted for candidate effects — Negative values mean that the seat swung more to Labor than expected, while positive values mean that the seat swung more to the Liberals/Nationals than expected. The dashed black line represents what we would expect to see under uniform swing. R² = 0.13, p = 0.0057

Once candidate effects are adjusted for, the average error on the elasticity model is about 3.66%, for an improvement of about 0.2%.

More interestingly, I’ve been experimenting with a model of elasticity which adjusts dynamically based on the statewide 2pp. The reasoning for this is pretty simple – if you have a seat which leans towards one side (say, the district of Armadale, which was the most Labor-leaning electorate going into the election), and an environment where that side is winning by a big margin, there’s only so far you can extrapolate a uniform swing before you start getting into the territory of the impossible (e.g. if somehow Labor won a 13% swing to them at the next election, uniform swing would bring Rockingham to more than 100% 2pp for Labor, or if the Liberals/Nationals won more than 70% of the 2pp, a similar situation would happen with Roe and the Nationals).

Another way to think about this is that for a party to win a massive share of the vote overall, they will have to win a significant proportion of that vote in seats which usually lean to the other side. In such elections, said party will likely have a massive proportion of the vote in seats which are usually safe for them, and to win an even greater share of the vote in heartland seats (to power their overall vote shares) is much more difficult thanks to rusted-on voters for the other side as well as voters who intentionally vote for the underdog and voters who randomly fill out their ballot. Hence, in massive landslides, it is more likely that the party’s large majorities will be powered by voters in seats which usually lean to the other side and marginal seats than that they will win roughly the same swings both in their safe seats and everywhere else.

I’ve developed a simple dynamic elasticity model which adjusts elasticity such that the model always reaches 100% of the vote in every electorate if that party wins 100% of the vote statewide (obviously not realistic for something like that to happen, but it’s simply a way to calibrate the model in the absence of any 70+% landslide elections). From an empirical standpoint, it actually improves on the existing elasticity model:

Predicted deviation from statewide swing using dynamic elasticity, vs actual deviation adjusted for candidate effects — As with the previous graph, all data has been adjusted for candidate effects. Negative values mean that the seat swung more to Labor than expected, while positive values mean that the seat swung more to the Liberals/Nationals than expected. The dashed black line represents what we would expect to see under uniform swing. R² = 0.27, p < 0.0001

The average error on the swing predicted by the dynamic elasticity model is about 3.45%, as compared to the 3.66% for the static elasticity model mentioned above and 3.82% for the uniform-swing model (and for those interested in correct-call rates, it would have correctly called Labor’s narrow win in Warren-Blackwood, although I maintain that the difference between “narrow Nationals hold” and “narrow Labor win” isn’t that great and is not evidence for or against a model by itself).

I will probably look into this further, and explore various models of dynamic elasticity. The only concern I have here is that attempting to model how electorates behave at unprecedented landslides is a bit dangerous considering we have no 80+% 2pp elections to calibrate any model to. At the same time I don’t believe that that’s a valid reason to not attempt to build a reasonable model, as long as we understand that we should be less confident in it than usual. In the same way, it would be counterproductive to pretend polls have no predictive value in massive landslide scenarios (or whatever election the pundits have decided to deem “uncharted waters”) just because pollsters have never polled in an environment like that before. Many of the same principles should still apply, after all.

Modelling candidate effects

One interesting thing I noticed at this election was that the candidate effects seemed to be significantly larger compared to what I expected given historical swing data. While the party-gain (what other psephs refer to as double-sophomore) swings were just 1.2% more than expected, the retiring incumbent effect was more than quadruple the expected swing. I have two hypotheses for this:

Personal vote effects might be amplified at massive landslide elections

The personal vote effects seen at this election are significantly higher than those seen historically (figures for this election available here). Historically, a retirement usually costs the retiring MP’s party about 1% on the 2pp, while a last-election gain (aka “double-sophomore” or “sophomore surge”) translates to an improvement of just under 2% on the 2pp. Although I haven’t looked into it yet myself, William Bowe’s BludgerTrack 2019 suggests that similar figures hold at the federal level, which makes it odd that Labor candidates running in seats with retiring Labor incumbents underperformed by 4.3% and Labor MPs first elected at the 2017 election who won their seats from another party outperformed by 3.2%.

I’ll preface this by noting that given how few massive landslide elections we have quality data for, we need to be aware that any effects might be a unique-to-the-issues effect and not a massive-landslide-elections effect. Treat the conclusions here with an appropriate dose of uncertainty.

In the 2012 Queensland state election, where the Liberal-National Party (LNP) won a 13.7% swing to it and an estimated 62.8% of the 2pp vote (the Electoral Commission of Queensland does not go through all the ballots to produce official 2pp estimates), electorates that the LNP had just won from Labor in the previous election swung 3.9% harder to the LNP than the rest of the state, fairly similar to the 3.2% seen at WA 2021. (There were only three such seats at the 2011 NSW state election, so I didn’t bother calculating an average for those).

Similarly, at the 2012 QLD state election, seats with a retiring Labor incumbent saw a 4% greater swing to the LNP than in the rest of the state. Ditto in the 2011 NSW state election (where the Coalition won 64.22% of the 2pp), where seats with retiring Labor incumbents saw an average 4.3% swing to the Coalition than in the rest of NSW. Again, both figures are fairly similar to the 4.3% seen at the 2021 WA state election, suggesting that personal vote effects may be amplified at massive landslide elections.

Personal vote effects might work differently to what we expected

Our personal vote effects model works on two assumptions:

Firstly, for a given party, that every candidate’s personal vote is roughly the same. While this assumption is obviously untrue – a variety of factors can affect personal vote, ranging from name recognition to things like fit for the district – without seat polling ahead of the election, it’s not possible to accurately estimate each MP’s personal vote. Hence for the purposes of modelling personal vote effects, we assume that every MP has, or had, broadly the same personal vote.

Secondly, we assume that personal vote effects for Labor vs Liberal/National contests work like so (minor party/independent incumbents have a much stronger personal vote):

Personal vote effects = (Change in personal vote of Labor) + (Change in personal vote of Liberals/Nationals)

What this means is that, for example, in a seat where the local Labor MP is retiring, we expect the personal vote effects to look something like this:

Personal vote effects = (Loss of Labor incumbent’s personal vote) + (No change in personal vote of Lib/Nat)

Since we estimate personal vote for each MP to be equal to about 1% on the 2pp, that would translate to:

Personal vote effects = 1% + 0 = 1% lower 2pp for Labor

On the other hand, in a seat where the local Labor MP was defeated by a Liberal/National challenger at the last election, we expect the personal vote effects to look something like this:

Personal vote effects = (Loss of Labor incumbent’s personal vote) + (New Lib/Nat incumbent’s personal vote)

(Keep in mind that since election modelling works off swings from the last election, these changes refer to changes in the presence/absence of personal votes since the last election)

Which would translate to:

Personal vote effects = 1% swing against Labor + 1% swing to Lib/Nat = 2% lower 2pp for Labor

Once all such effects have been taken into account, we then adjust the 2pp vote of all seats such that the overall 2pp matches our model’s statewide 2pp prediction (the purpose of the candidate effects model is to tell us where each party is likely to over-/underperform the statewide swing, not tell us what the statewide swing is).

The reason I wonder if this model may not be accurate is because of the much larger than expected personal vote effects seen with retirements. According to the figures mentioned above, Labor’s underperformance in seats where its incumbents retired (4.3%) was larger than its overperformance in seats it had just gained at the last election (3.2%), whereas our model expects that retirement effects should be half that of “double-sophomore”/”sophomore surge” seats – i.e. those that Labor just gained at the last election.

This pattern also held in the 2012 Queensland state election figures we mentioned above, and recently at the 2020 Queensland state election, Dr Kevin Bonham finds that seats with a retiring incumbent underperformed the statewide swing by 3.3% while “double-sophomore” seats overperformed by 2.0%. However, at the federal level, the regression analyses cited by BludgerTrack 2019 suggest that the personal vote effects produced by retirements should be about two-thirds that of “sophomore-surge” seats (1.0% vs 1.4%), so I’m not entirely sure what’s driving the significantly larger underperformance associated with retiring incumbents at the state level.

One possibility might be the smaller electorates at the state level, which more easily allow a local incumbent to get to know and be in contact with a larger chunk of their voters (I might do an analysis of electorate size vs estimated personal vote effects to see if this is the case). Anecdotally, incumbents in the Northern Territory Legislative Assembly, where electorates are just 2000 – 5000 apiece, tend to have very strong personal votes (although that may be confounded by the presence of candidate photos on the NT ballot).

Another possibility might be that, to put it simply, all effects are muted at the federal level. After all, the biggest 2pp win at the federal level is 58.5%, compared to the multiple 60+% 2pp landslides seen at the state level; and the average federal election has a 2pp of about 51.7% since 1983 (which is when we start having full distributions of preferences instead of estimates) while by the same measure the average state election sees a 2pp of 54.7%. Whether because partisan loyalties are stronger at the federal level or some other reason, it might just be that swings and personal votes are larger at state elections; though in the case of personal votes it can be hard to disentangle this from the smaller-electorates hypothesis above.

So, that concludes our WA 2021 analysis. We hope you’ve found our WA 2021 forecast and our analyses informative so far, and our thanks to Antony Green (for his analysis of state elections, and the data he helps to produce e.g. redistribution and 2pp estimates etc), Dr Kevin Bonham (for his psephological pieces – anything with a self-declared Wonk Factor of greater than 3/5 is always very informative, and providing us with the poll data we needed to build our models and analyses), Ben Raue of The Tally Room (for his analyses and for providing us with his painstakingly-processed historical state election data)³^x If I may venture to allow some personal opinions into this for a moment, I find it ridiculous that in the current day, many state Electoral Commissions still use such formats as PDFs and picture files (PNGs) to publish election data.

The data is still publicly available, after all, so it’s not like there’s a privacy issue or some other concern; and to produce the tables printed in the PDFs and PNGs published by said Electoral Commissions, you almost certainly have the data in a more accessible format somewhere else (spreadsheet, database etc) anyway. Why not release it in that format?

(And yes, I have asked an Electoral Commission before about this; they responded to my inquiry saying they did not release it in some other format and ignored my other question as to why. They also proceeded to ignore a small request on my part that in future PDF releases they enable all the borders on the page (i.e. give all cells a black border, in the tables in their PDFs) which would allow for processing by PDF applications (which, by the way, was their suggestion in response to my question – use a PDF writer to export to other formats).) and William Bowe of The Poll Bludger (for coverage of polling and the election, and providing us with historical polls needed to build our models and analyses).

How Meridiem performed – Part II

Modelling differences in electorate elasticity

Wait – that’s a very tiny difference! The R² is miniscule! It’s not statistically significant!

Modelling candidate effects

Personal vote effects might be amplified at massive landslide elections

Personal vote effects might work differently to what we expected

Add Your Comment Cancel reply

Modelling differences in electorate elasticity

Wait – that’s a very tiny difference! The R2 is miniscule! It’s not statistically significant!

Modelling candidate effects

Personal vote effects might be amplified at massive landslide elections

Personal vote effects might work differently to what we expected

Add Your Comment Cancel reply

Wait – that’s a very tiny difference! The R² is miniscule! It’s not statistically significant!