Introduction
With everything that’s been going on in the world, Australians may be forgiven for not knowing that South Australia goes to the polls on 19/March/2022. Unlike the decimation achieved by the state government in Western Australia, the South Australian election looks to be a much more “normal” state election, with a state government facing federal drag, the Liberal Party battling to snag seats from defectors, and a significant degree of electoral boundary redrawing by the Electoral District Boundaries Commission (EBDC).
A Fallen Icarus
One of the factors in this mix has been the decline of Nick Xenophon’s former party, SA Best (formerly the Nick Xenophon Team [NXT] and Centre Alliance [CA]). The most recent polling of SA Best voting intention in the House of Assembly had it fluctuating in the mid single digits, a far cry from the near-20% figures it was polling before the last election, the 21.3% it snagged in the 2016 federal election or even the 14.1% it won at the last state election. To go from a grouping which once won more votes than a major party at a federal Senate election (in South Australia) to fielding just one candidate in a state which should be its stronghold is, well, quite the turnaround.
South Australia has historically been fairly supportive of parties which position themselves in the political centre, with the historical Australian Democrats doing better in SA than in the rest of Australia:
So the near-complete disappearance of SA Best from the lower house contest sets up a fairly interesting dynamic. Will those voters return to the major parties in similar proportions as their preferences did at the last election? (51.6% Liberal, 48.4% Labor) Do they switch to voting for the opposition to express frustration with the status quo? Or, given that SA Best won the most votes in safe Liberal seats (of SA Best’s 5 best electorates, not one has a Liberal 2-party-preferred of less than 60%), do they disproportionately switch to the Liberals?
How our model works – a summary
Just so I don’t make people sit through another 10 000 word methodology writeup, here’s a summary for how the model works (bits that differ from our WA 2021 model (Meridiem 2021) bolded):
First, it takes recent polling on South Australian state voting intention, if any is available. If not, it combines old voting intention polling with data from recent Premier approval polling and a federal drag model to obtain an estimate of forecasted voting intention (if this happens, the uncertainty in the forecast is much higher than usual).
Next, it uses historical data on primary vote polling errors to generate a probability distribution for each party’s primary vote. Unlike Meridiem 2021, we use a beta distribution to simulate random polling errors for each party’s primary vote, with the alpha and beta shape parameters estimated using the method of moments.
The model then uses historical data on two-party (LIB-vs-ALP) preference flows to simulate possible variation in the two-party preference flows. These randomly generated preference flows are then used to estimate two-party-preferred (2pp) vote shares for each simulation.
Once vote share simulations are complete, the model proceeds to look at each electoral district’s voting history and patterns. Where Meridiem 2021 estimated the lean (e.g. how much more does Croydon tend to vote Labor than the rest of the state?) and the elasticity of each electorate, this model attempts to develop a function which “predicts” what the vote for each grouping (Labor/Liberals/Greens/Others) will be, given a certain statewide vote for said grouping. For statewide vote shares which are fairly close to the historical average for each party, this model does not differ much from uniform swing (with an exception which I shall discuss in a follow-up article). However, it fixes some of the bugs with our elasticity model from Meridiem 2021 and is (in my view) less likely to produce strange results if results become lopsided in either direction. For example, if the Labor or Liberal 2-party-preferred (2pp) gets above 76%, uniform swing would produce 2pp estimates of greater than 100% for Croydon (on the Labor end) or Flinders (on the Liberal end).
Instead of using a linear function (as in uniform swing, or uniform swing + elasticity), we use a beta CDF to translate an expected statewide vote to an expected district vote. It’s more difficult to fit, but in my view it’s less likely to produce impossible figures in lopsided elections.
The model then adjusts for candidate effects e.g. incumbency advantage. In particular, in South Australia, there have been many major party incumbents who defect to the crossbench and proceed to hold their seats. Parties or individual candidates can out-perform a “generic” candidate if they’re incumbents, are high-profile (think Xenophon, Pauline Hanson etc) or have prior electoral experience in the electorate.
For each simulated set of statewide vote shares, the “expected” vote share for each party/grouping (Labor/Liberals/Greens/Others) is generated using the previously-fitted function. For the Others grouping specifically, our model also looks to history to generate the “expected” vote share for a generic Others candidate given 1) the number of Others candidates and 2) the statewide Others vote share. While increasing numbers of Others candidates tends to increase the total Others vote share in a district, I’ve found that unless the candidate has particular candidate effects (see previous paragraph), increasing Others candidates tends to reduce the average vote per Others candidate.
Unlike Meridiem 2021, we don’t generate correlated deviations from the statewide swing in this model. In my view, the inter-electorate correlation model is still a good idea (see our scoring of the WA 2021 model for the average 2pp error without it). However, given the fairly drastic redistributions that have taken place in South Australia over the last two decades, I’m not convinced that we can estimate inter-electorate correlations or even demographic data with the degree of accuracy needed to construct a reasonably accurate model.
Hence, in this model, all deviations from statewide swing are assumed to be uncorrelated. These are generated based on a average of:
- The historical average deviation from statewide swing across the state for each grouping.
- The historical average deviation from statewide swing in each electorate.
- Expected average deviation from statewide swing, for a given expected vote share. Historical data shows that generally as a party/grouping’s vote approaches 0%, the amount it tends to differ from statewide swing goes down.
A beta distribution for each party/grouping is then generated using the expected vote in each simulated district and the expected variance with the method of moments, from which a random deviate is generated in each simulation. All simulations are then readjusted such that the primary vote shares for all parties/groupings match the primary vote share for their respective simulation, and such that the primary vote shares in each simulated district sum to 100% and no negative vote shares have been generated. If the two conflict, the latter takes precedence. Generally the adjustments required by the former (ensuring all districts in simulations match up to their simulation’s generated primary vote) are very minor however (on the order of 0.001% – 0.1%).
Next, we attempt to simulate which two candidates will make it into the final two count. To start off, if a party has more than a third of the simulated vote, it is included in the final two count as it is mathematically impossible that said party is eliminated from the count. Next, if parties and groupings other than the top two (on primary vote) combine for less than a third of the vote, they are eliminated and the top two primary-vote-getters are used for the final two count. Let’s say Labor wins 34%, the Liberal wins 34%, and all other candidates combine for 33% of the vote.
Even if every single vote for a minor/independent candidate transferred to another minor/independent candidate (which electoral history shows to be so unlikely as to be not worth considering), there is no way any minor or independent candidate could make it into the final two. Finally, for all remaining simulated districts with uncertainty over which two candidates will end up in the final two, we use a similar logistic regression to the one we used in Meridiem 2021 to randomly select final two matchups from the remaining candidates.
Finally, for each simulated district, we estimate the two-candidate-preferred between the final two candidates.
For “classic” (Labor-vs-Liberal) contests, this is estimated using the randomly-generated two-party preference flows in each simulation and the historical two-party lean of each district. The latter has been added to account for electorates where preference flows to one side has been historically stronger than the statewide preference flow to that side (e.g. an electorate where Green voters tend to preference Labor at higher rates than Green voters in the rest of the state).
For “non-classic” contests e.g. Labor-vs-Independent, Liberal-vs-National, Liberal-vs-Independent etc, we attempt to estimate the two-candidate-preferred using the primary votes and historical distributions of preferences between similar candidates.
The simulated “winner” is then determined by which candidate has more than 50% of the two-candidate-preferred, and the data aggregated to produce seat distributions and other figures/tables.
Changes in our model – more detail
Large chunks of our model are fairly similar to the WA 2021 model, so I have decided to elaborate on the parts that we’ve changed instead.
Slightly different fundamentals model
Unlike the previous model, which relied solely on an average of federal drag results, our new model combines both federal drag and government age to produce a more accurate “fundamentals” prediction.
Given the relative youth of the Marshall Liberal government, this basically pulls the forecast closer to 50/50 than a pure “federal-drag-only” forecast. I once again considered the inclusion of economic variables but again found that their predictive value was fairly weak.
Choice of probability distribution for primary votes
In the previous model, we used a Gosset’s t-distribution (more commonly known as the Student’s t) to simulate the vote shares for major parties and a log-t distribution for simulating minor parties. However, as they are continuous, non-bounded distributions, both can sometimes, if rarely, produce deviates that are less than 0 (Gosset’s t only) or greater than 100. This possibility is amplified for parties which poll close to 0 or 100% of the vote, and for parties or groupings with an unusually high error size.
For example, if One Nation polled 10% right before a Queensland state election, given the historical average One Nation error size in Queensland polls, below are the normal, Gosset’s t and beta distributions:
As the beta distribution is bounded between [0,1], it appears to be a more appropriate probability distribution than the normal distribution or Gosset’s t. Additionally, the tendency for the beta distribution to become skewed when approaching 0% or 100% (notice how in the graph above, the beta distribution has a longer right-tail) is a more accurate representation of polling errors for small-but-polling-error-prone parties/groupings.
Replacing the elasticity model
The elasticity model refers to my attempt to model how responsive each electorate is to the statewide swing. While the elasticity model performed well in WA 2021, there was a bug in it which led to the district of Churchlands having a much lower estimated elasticity than should have been possible. In my post-mortem, I discussed the problem of using a linear function for predicting expected vote when one starts to get into massive-blowout territory. For example, if SA Labor got a 27% swing to it, a uniform swing would expect them to win 101.4% in the district of Croydon. While such a blowout is highly unlikely to begin with, in my view a model should attempt to be robust to events which are within the realm of possibility of its subject matter. It probably did not help that in WA 2021, there was a genuine possibility that Labor would win upwards of 70% of the two-party-preferred.
As part of that post-election analysis, I suggested a dynamic elasticity model. For example, let’s say you have a very elastic seat which favours the Liberals by 30% versus the rest of the state (e.g. it tends to see swings 1.2x greater than usual). If the Liberal statewide two-party-preferred is greater than 16.7%, a constant elasticity model would start to generate figures above 100%.
A dynamic elasticity model – one where elasticity “shifts” in such a manner that 0% statewide = 0% in seat, and 100% statewide = 100% in seat – would fix this issue. After several tests, I’ve settled on the beta cumulative density function (CDF) for this purpose.
The one issue with the beta CDF is that it does not appear to have a closed-form solution that would allow for rapid fitting. Hence, my methodology is to generate shape parameters iteratively until a selected stopping point. Shape parameters are selected based on how well they fit the following set of observations (each bullet point equally weighted).
- Actual vote data from the electorate e.g. estimated historical 2-party-preferred vote shares from Hartley (y-axis), versus the statewide vote share for the same party at each election (x-axis).
- Fitted “uniform swing” vote data from the electorate based on the electorate’s lean at the most recent election, versus the statewide vote share for the same party at each election. e.g. the figures for the Liberal 2pp in Hartley would be Electorate: {56.6%, 57.7%, 56.3%, 47.9%, 55.6%}, Statewide: {51.9%, 53.0%, 51.6%, 43.2%, 50.9%}.
- Fitted “elastic swing” vote data from the electorate, based on the electorate’s lean at the most recent election, estimated elasticity based on historical data, and the statewide vote share for the same party at each election. e.g. in Hartley, for an estimated elasticity of 1.1, the figures for the Liberal 2pp would be Electorate: {56.1%, 57.3%, 55.8%, 46.6%, 55.0%}, Statewide: {51.9%, 53.0%, 51.6%, 43.2%, 50.9%}.
“Fit” is determined by minimising the average deviation between the entire set of observations above, and the “expected” vote for given statewide vote in each observation. This selection of observations gives some weight to past results, helping to provide more context for ‘rogue’ elections, whilst also heavily weighting the lean of the electorate at the last election to account for the possibility of realignments. It also provides higher accuracy in backtesting compared to attempting to fit the function to just the vote data from the electorate (point 1).
This produces a function like the one below:
In backtesting (asking the model to predict a vote share for a district in a previous election, without having data from elections after that one), this method of fitting a beta CDF to historical + simulated data significantly outperforms uniform swing in all parties/groupings except predicting the Others vote (unless I adjust for candidate effects). However, the combination of an adjusted beta CDF (basically, adjusting up or down previous elections where Others candidates had different candidate effects from the “new” set of Others candidates before fitting) and an Others vote predictor which accounts for number of Others candidates, statewide Others vote and candidate effects does beat uniform swing (unsurprisingly), and hence that’s the predictor we’ve opted to use.
Deviations from statewide swing
As in the 2021 WA model, we generate random deviations from statewide swing, although in this case it’s not so much “deviation from statewide swing” as “deviation from expected district vote given model based on voting history in said district”.
As noted above, a key difference is that in this model, we do not account for the possibility of correlated deviations from statewide swing (e.g. wealthy inner Adelaide districts swinging together versus working-class outer Adelaide electorates etc). This is primarily because of the relative lack of data in SA, combined with past drastic redistributions, which in my view makes it very difficult to assess what proportion of deviations from statewide swing are correlated versus uncorrelated.
If marginal seats on the 2022 electoral boundaries tend to strongly correlate in their deviations from statewide swing (for those familiar with US politics, think Michigan/Wisconsin/Pennsylvania), then the model will likely somewhat under-estimate the uncertainty in the seat distribution, while if marginal seats tend to correlate less than expected by chance, the model will over-estimate the uncertainty in the seat distribution.
Unlike the 2021 WA model, the model takes into account the historical uncertainty in vote shares for each district, to better model electorates which tend to ignore the statewide swing (think Tasmania at federal elections). Additionally, it accounts for the expected share of the vote in each simulated district – in cases where the expected vote (from the beta CDF [Labor/Liberals/Greens] or beta CDF + other predictors [Others]) has been closer to 0%, the uncertainty in the vote share has tended to be lower as well.
Minor change to the two-candidate match-up model
We’ve added an extra criterion for a party or candidate outside of the top two to overtake one of the top two parties/candidates and finish in the final-two count. For this to happen, the parties/groupings outside of the top two must combine for at least a third of the vote – something I probably should have included in the WA 2021 model but didn’t think of (not that it made much of a difference).
Here’s an example to illustrate the point. Let’s say the Liberals win 38% of the primary vote, Labor wins 32%, and all other parties – Greens, Independents, Family First, One Nation etc – combine for 30% of the vote. Intuitively, you can see how there is quite literally no way for a minor party or independent candidate to overtake one of the top two parties and finish up in the final-two count, even if every single voter from a minor party/independent candidate preferenced all other minor parties/independents over all major parties (which is in and of itself insanely unlikely!).
Minor change to the electorate two-party-preferred model
This is a fairly minor change. Basically, we don’t just rely on the two-party-preferred estimate generated using simulated district primary vote shares and the simulated preference flows in each simulation, we also look at the history of the two-party-preferred in each electorate. This is important because in some electorates, voters for the same party (e.g. the Greens) consistently preference one party more than voters for the same party elsewhere in the state.
Previously, we estimated this adjustment using statewide preference flows at the last election on the primary vote data for each district, and comparing the “expected” two-party-preferred for each district to the actual two-party-preferred to develop a linear adjustment for each electorate. Here, I’ve decided to trial a new system which develops a two-party-preferred model for each electorate (as you may have seen in the graph above for Hartley), and the expected electorate 2pp given statewide 2pp is averaged with the preference-flow-based 2pp to adjust for electorate differences in preference flows.
Thanks to anyone who’s made it this far! Additionally, I would like to thank fellow psephologists Dr Kevin Bonham and Ben Raue for providing me with access to their historical polling archives and historical election result data respectively – thanks very much for saving me weeks of trawling and processing those datasets!
Has the model been updated to include poll released by The Advertiser on March 15? Labor 56, Liberal 44?
Yep, just updated. Currently the 95% confidence interval (i.e. what other outlets call “margin of error”) has shifted to Liberal 2pp 40% – 50%. SA Labor is now pretty clearly favoured for a majority.