How our model of preference distribution works

(if you haven’t seen our 3-candidate-preferred explorer for the 2018 Victorian state election, click here)

Preferential voting and the three-candidate-preferred

Historically, the outcome of Australian elections is often summed up using some version of the two-party pendulum, where seats are ordered based on the margin between the final two candidates (the two-candidate-preferred, or 2cp) as a measure of how “safe” each seat is. However, at the recent 2022 Australian federal election, the increase in votes for parties/candidates from outside the two largest parties/groupings resulted in a much higher number of close three-cornered contests. In many of these contests, the eventual winner was either nearly eliminated in the preferential-voting process (see below if you need a refresher) or came very close to facing a different opponent, who might have defeated them (e.g. Griffith).

(if you’re familiar with the three-candidate-preferred, click here to skip to the methodology)

How does the preferential voting system work?

Under the preferential voting system, voters rank the candidates on their ballot in order of which ones they prefer to be elected first. For example, let’s say we had four candidates running in an electorate, from the Labor, Liberal, National and Democrat parties.

A hypothetical voter might prefer that the National candidate is elected first of all, but if the National can’t win, they would prefer that the Liberal is elected, and then prefer the Democrat candidate over the Labor candidate. This voter would fill in their ballot as such:

FLUGGE, Trevor
NATIONAL
TUCKEY, Wilson
LIBERAL
PEEBLES, Shyama
AUSTRALIAN DEMOCRATS
CHANCE, Kim
AUSTRALIAN LABOR PARTY

In a House of Representatives (the lower house, where government is formed) election, all ballots are first processed and counted, and a primary vote (or first-preference vote) tally produced. This refers to the % of voters who put one party first. For example, if one in five voters put the National candidate first, then the National Party would have a primary vote of 20%.

Once all ballots have been processed and counted, the candidate with the lowest primary vote is sequentially eliminated, and their voters’ ballots will be transferred to their next preference. For example, let’s say that in this election, each party has a primary vote of:

Liberal 49.5% Labor 25.1% National 23.1% Democrat 2.4%

The Democrat candidate will be eliminated first, and their votes transferred to each voter’s second preference. For simplicity’s sake, let’s assume that half of Democrat voters placed Labor 2nd, while a quarter each placed the Liberal and National candidates second.

Liberal 50.1% Labor 26.3% National 23.7%

Note that in the lower house (the Legislative Assembly), this is entirely controlled by who the voters place second on their ballot – candidates and parties do not have any control over where preferences go. A party or candidate may recommend preferences using how-to-votes and other material, but where the ballot travels next is entirely up to the voter.

You may occasionally hear of “preference deals” and “(party) directs preferences to (party)” in the news or other media. In the Legislative Assembly, this only refers to the parties’ ability to recommend that their voters put Party A over Party B. If a voter decides to ignore this recommendation and preference Party B over Party A, their ballot will go to Party B’s candidate at full value.

(This is different in Victoria’s upper house, the Legislative Council. There, if you vote 1 for a party, that party gets to decide who your preferences go to after them. Your preferences are only respected in the Legislative Council if you vote below the line (i.e. number candidates for the party).)

The proportion of primary votes for a certain party which are then transferred to another party is also known as the preference flow. In this case, the preference flow for Democrat votes would be 50% Labor, 25% Liberal and 25% National.

While preference flows are referred to as percentages, note that in the Legislative Assembly, there is no partial vote transfer. If you hear that the preference flow from the Greens to Labor is 80%, that doesn’t mean that 80% of each Green vote goes to Labor. It means that four of five (80%) Greens voters put the Labor candidate ahead of the other candidate on their ballot, while one in five (20%) put the other candidate ahead of Labor.

Preference flows are a useful way to calculate the outcome of a preferential-voting contest. For example, if I told you that in an election, Labor won 48%, the Liberals won 32% and the Nationals won 20%, if you know what the National -> Liberal preference flow is, you can calculate the final Labor-versus-Liberal result in that election.

Speaking of which, let’s finish our example preferential-voting election. As the National candidate has the lowest vote share of the remaining candidates, he is eliminated. Since our hypothetical voter from earlier voted 1 National 2 Liberal, their vote is then transferred to the Liberal. Had they instead voted 1 National 2 Democrat 3 Labor 4 Liberal, their vote would instead be transferred to Labor (as the Democrat candidate has already been eliminated).

For simplicity’s sake, let’s assume that 80% of all voters who voted 1 National or 1 Democrat 2 National then places the Liberal candidate over the Labor candidate.

The vote shares of the final two candidates is often referred to as the two-candidate-preferred, or 2cp for short. It is sometimes also referred to as the two-party-preferred; however this can be confusing as the two-party-preferred is often also used to refer to the share of voters who preferred a Labor candidate over a Liberal/National Coalition candidate, even in seats where a minor party or independent candidate made the final two.

It’s difficult to accurately place such electorates on a classic two-party pendulum using their two-candidate-preferred result. For example, the Division of Macnamara is classified as a “safe Labor” seat based on its 2cp (Labor 62.25%, Liberal 37.75%). However, once preferences were distributed, the vote share of the final three candidates in Macnamara was:

3-candidate-preferred for Macnamara at the 2022 federal election. Liberal 33.67%, Labor 33.48%, Green 32.84% — Liberal 33.67%, Labor 33.48%, Green 32.84%

These figures, often referred to as the three-candidate-preferred (3cp), paint a very different picture of the result in Macnamara versus the two-candidate-preferred. While Macnamara is nominally a “safe Labor” seat in a Labor-vs-Liberal matchup, the 3cp shows that Labor was just 0.64% off failing to make it into the final-two at all.

At this point, I thought to myself – wouldn’t it be interesting to make a tool that allows people to examine the three-candidate-preferred at the recent federal election?

And then I thought to myself – why not do one for the previous Victorian state election? After all, there’s another state election coming up there.

Big mistake.

A scene from Rick and Morty:RICK: (opens portal) Let's go. In and out, 20 minute adventure.*cuts to the completion of the adventure**caption: 3am, several hours later**both Rick and Morty have eyebags and are crying*

As it turns out, the Victorian Electoral Commission (VEC) does not perform a full distribution of preferences in electorates where one candidate has an outright majority (>50%) of the first-preference vote, or ends up with an outright majority at any point in the count. This meant that there are several electorates for which an exact three-candidate-preferred figure could not be computed, as preferences were either never distributed or were not distributed to the point where only three candidates were left in the count.

Hence, if I wanted to provide 3cp estimates for all electorates, I’d have to calculate my own for some of them. Below is the full detail of how my current model of preference distribution works – we’ll probably be using this model or something like it for future forecasts too, so I thought having this out as a reference piece would be useful.

(if you’re more interested in how our preference distribution model performs in backtesting than how it works, click here)

How our model works

Step 1: Collect data on preference distributions at the last election

To be able to model preference distributions, you need to know something about preference distributions. Since preference flows at the most recent election tend to be the best predictor of future preference flows, we don’t use data from two or more elections ago in this process.

This involves scraping data from HTML tables, and reprocessing it in both machine-friendly formats (CSV, Excel) as well as formats which are actually useful for any further research we want to do (e.g. which party was eliminated in each transfer, and who did their preferences go to?). To save anyone else who’s interested in VIC 2018 some trouble, I’ve uploaded the distribution of preferences conducted by the VEC here, formatted in the same way as the distribution of preferences dataset released by the federal Australian Electoral Commission.

Step 2: Produce classifications for parties who ran at the election

Every party who ran a candidate at the election was classified based on their ideology as well as which parties they had similar preference flows to. The latter is important as these classifications are being used for the purposes of modelling preference flows – a party which endorses far-right policies but whose voters, for whatever reason, preference left-wing parties would be classified as being “Left” in this system. Two classifications were assigned, a more specific “Type” as well as a more general “Category” (e.g. left/right/centre); you can see my classifications for parties which ran at the 2018 Victorian state election here.

This is important because some parties run a very small number of candidates. Hence, sometimes we may not have any data (or very little data) on the preference flows to/from these parties, as they ran in seats where no preference distribution was undertaken. To deal with this, we may have to rely on data on preference flows to/from parties like them, e.g. using preference transfers from “left” parties to a Transport Matters candidate, instead of specifically preference flows from the Victorian Socialists.

Step 3: Identify electorates which require preference distributions

In particular, for the 2018 Victorian election 3cp explorer, we need to differentiate between electorates where no distribution of preferences was undertaken at all (i.e. seats where one candidate won a majority on first-preferences) and electorates where the distribution of preferences was simply incomplete. The former are much more complicated to model, for reasons which should become apparent below.

For an election model, this would be any electorate where all parties other than the top two combine for more votes than the second-placed candidate. e.g. in a simulation where Labor wins 36% of first preferences and the Green wins 34%, all parties and candidates other than Labor and the Green would only combine for 30% of the vote, making it mathematically impossible for the matchup to be anything other than Labor-vs-Green. In those simulations, the model simply calculates the two-candidate-preferred using simulated preference flows between the final two parties (or the two-party-preferred, if it is a Labor-vs-Lib/Nat matchup) as it’s a lot less computationally expensive to do so.

From this point on, all steps (until step 9) were conducted on an electorate-by-electorate basis. In other words, steps 4-9 were looped for each electorate which required a preference distribution.

Step 4: Determine the preference transfer needed

This basically involves identifying the candidate (and associated party) which has the lowest number of votes as the candidate/party to be “eliminated”, and identifying which parties would remain in the count to receive preferences from said candidate/party.

Taking the district of Malvern as an example – for the first count, we would identify the Animal Justice candidate as the party to be eliminated, and the parties to receive would be Liberal, Labor, Green and Sustainable Australia. This process becomes more complicated with Independent candidates – we assign each independent a unique identifier that lets us assign different preference flows to different Independent candidates. In an election model, we’d probably go through the effort of classifying each independent on the basis of which category (see step 2) their preference flows would be most similar to. However, in the case of the VIC 2018 3cp explorer, this was not undertaken as the amount of effort required wouldn’t be worth the increase in accuracy.

Step 5a: Find identical preference transfers in other electorates

Using Malvern as an example – we’d try to find any preference transfers where an Animal Justice candidate was eliminated and their votes transferred to a Liberal, a Labor, a Green and a Sustainable Australia candidate.

If no such preference transfer could be found, the model tries to:

Step 5b: Find preference transfers containing the same recipients

Continuing with the example of Malvern, this step would also accept a preference transfer where an Animal Justice candidate was eliminated and their votes transferred to a Liberal, a Labor, a Green, a Sustainable Australia and an Independent candidate.

In addition, the model also looks for:

Step 5c: Find preference transfers containing at least two of the same recipients

Again, continuing with the example of Malvern, this step would also accept a preference transfer where an Animal Justice candidate was eliminated and their votes transferred to a Liberal, a Labor, and a Shooters/Fishers/Farmers candidate, or a preference transfer where an Animal Justice candidate was eliminated and their votes transferred to a Liberal, a Green and an Independent candidate.

If none of the above could be found, the model starts looking for:

Step 5d: Find preference transfers from similar parties

Sticking with Malvern: this step would start out by accepting a preference transfer from a Victorian Socialist (also classified as “Left” on the basis of similar preference flows) to a Liberal, a Labor and a Green candidate; or a preference transfer from a Reason Party candidate to a Labor, a Green and a Derryn Hinch’s Justice candidate.

If this fails, it then tries the broader “Category” (see step 2) classification system. So for example, if the candidate to be eliminated is from the Liberal Democratic Party (classified as type “RightLibertarian”), and no preference transfers from the LDP or from the Australian Liberty Alliance (the two “RightLibertarian” parties) could be found, the model begins to accept preference transfers from any party classified as right-leaning, with at least two matching recipients. For example, in the LDP example, it would accept a preference transfer from the Democratic Labour Party to Labor, Liberal, Green candidates as being a “similar” preference transfer.

If this still fails – which does happen when one of the smaller parties is included as a recipient – the model finally searches for:

Step 5e: Find preference transfers from similar parties, to similar parties

Similar to step 5d, except that the search-widening process is applied to the recipient parties in addition to being applied to the eliminated party. e.g. if the candidate to be eliminated is from the Liberal Democratic Party (type “RightLibertarian”, category “Right”), and the recipients are Labor (type “ALP”, category “ALP”), Reason (type “Left”, category “Left”), and the Democratic Labour Party (type “Solidarist”, category “Right”), this search would begin by accepting any preference transfer which matched the following characteristics:

Accept transfers from any of:	Accept transfers where parties from at least two of these columns receive votes:
Type "RightLibertarian"	Type "ALP"	Type "Left"	Type "Solidarist"
Liberal Democratic Party	Australian Labor Party	Reason Party	Democratic Labour Party
Australian Liberty Alliance		Animal Justice Party	Shooters, Fishers and Farmers
		Victorian Socialists

If this fails, the search is further expanded to the more general “Category” classification. Sticking with the above example (eliminated: LDP, category “Right”, recipients: Labor, Reason, DLP, categories: “ALP”, “Left”, “Right”), the search would accept any preference transfer with the following characteristics:

Accept transfers from any of:	Accept transfers where parties from at least two of these columns receive votes:
Category "Right"	Category "ALP"	Category "Left"	Category "Right"
Liberal Democratic Party	Australian Labor Party	Reason Party	Democratic Labour Party
Australian Liberty Alliance		Animal Justice Party	Shooters, Fishers and Farmers
Australian Battlers Party		Victorian Socialists	Australian Country Party
Australian Country Party		Australian Greens	Australian Battlers Party
Democratic Labour Party			Sustainable Australia
Shooters, Fishers and Farmers			Liberal Democratic Party
Sustainable Australia			Australian Liberty Alliance

Note, the duplication of the Liberal Democratic Party in both the "accept transfers from" and "accept transfers to" section is intentional - there have been instances where a political party ran two candidates in the same seat.

Step 5f: Repeat steps 5d-5e if certain pairings could not be found

In some cases, step 5c works perfectly fine in terms of finding some pairings containing the relevant parties, but miss one or more pairings. For example, in a transfer from a Reason Party candidate to Labor, Liberal, Socialist and Transport Matters candidates, it might successfully find transfers from:

Reason to Labor + Liberal
Reason to Labor + Socialist
Reason to Labor + Transport Matters
Reason to Liberal + Socialist
Reason to Liberal + Transport Matters

However, it might not be able to find a transfer from Reason, to candidates including a Socialist and a Transport Matters candidate. In that situation, steps 5d-5e are used to find a suitably-similar preference transfer.

Step 6: Assign weights to the relevant preference transfers

In this step, we assign weights to the collected list of “similar” preference transfers based on how similar they are to the preference transfer of interest. In particular, preference transfers from electorates in the same Legislative Council region are collectively assigned a weight of at least 50%.¹^x This weighting only applies if there is more than one preference transfer which took place in the same Legislative Council region, and if those preference transfers would have collectively received less than 50% of the weight otherwise.

For example:

PrefTransfer1 – same region, PrefTransfer2 – same region, PrefTransfer3 – different region: no additional weighting applied

PrefTransfer1 – same region, PrefTransfer2 – same region, PrefTransfer3 – different region, PrefTransfer4 – different region, PrefTransfer5 – different region: PrefTransfer1 and PrefTransfer2 weighted at 50% collectively (individual weights may differ based on factors below)

In addition, weights are applied as follows:

Percentage of vote transferred being the party’s first-preference vote: So for example, if the Green started out with 10% in first preferences, and received 1% from a Socialist candidate, the weight would be 10/(10 + 1) = 0.909. This is because with such preference transfers, there may be voters for another party whose preferences flow differently in the mix, which would make this a less-accurate model of the preference flows for the party to be eliminated.
Number of parties who received votes in this transfer, but are not receiving votes in the transfer to be modelled: For example, let’s say we want to model a transfer from the Australian Liberty Alliance to Labor, Liberal, and Green candidates. The preference flows in such a transfer might be very different if there was a Liberal Democratic Party candidate in the mix; hence preference flows with additional candidates which we’re not interested in modelling are penalised, by multiplying their weight factor by 1/(1 + number of additional candidates).
Type of preference transfer: In particular, preference transfers sourced from steps 5d and 5e have their weights multiplied by 1/2 and 1/4 respectively. In most cases this is practically equal to not weighting them in the first place, as the model only goes to those steps once all others have failed; hence they end up providing the only preference transfers available for those parties.

Below is an example for the preference transfer weighting table from the District of Bellarine. Bellarine is located in the Legislative Council region of Western Victoria, with the party to be eliminated being the Animal Justice Party and the parties to receive votes being Labor (ALP), Liberal (LIB), and the Green (GRN).

District	Region	Transfer from:	Transfer to:	Region-based weighting	Percent from first-preferences	Number of parties which are not modelled	Within-region weight	Final weight
Bayswater	Eastern Metropolitan	AJP	LIB,ALP,GRN	4.8*10^-7	100	0	1	4.8*10^-7
Croydon	Eastern Metropolitan	AJP	ALP,LIB,GRN	4.8*10^-7	100	0	1	4.8*10^-7
Warrandyte	Eastern Metropolitan	AJP	ALP,GRN,LIB	4.8*10^-7	100	0	1	4.8*10^-7
Hastings	Eastern Victoria	AJP	GRN,ALP,LIB	4.8*10^-7	100	0	1	4.8*10^-7
Brunswick	Northern Metropolitan	AJP	IND,RP,ALP,LIB,GRN	4.8*10^-7	91.3	2	0.304	1.5*10^-7
Melbourne	Northern Metropolitan	AJP	GRN,LIB,ALP,RP	4.8*10^-7	85.9	1	0.43	2.0*10^-7
Northcote	Northern Metropolitan	AJP	GRN,LIB,ALP,RP	4.8*10^-7	91	1	0.455	2.2*10^-7
Pascoe Vale	Northern Metropolitan	AJP	LIB,ALP,GRN,IND,SOC,IND	4.8*10^-7	98.1	3	0.245	1.2*10^-7
Frankston	South Eastern Metropolitan	AJP	DHJ,GRN,DLP,LIB,ALP	4.8*10^-7	85	2	0.283	1.4*10^-7
Albert Park	Southern Metropolitan	AJP	LIB,GRN,ALP	4.8*10^-7	78.1	0	0.781	3.7*10^-7
Brighton	Southern Metropolitan	AJP	ALP,LIB,GRN	4.8*10^-7	87.2	0	0.872	4.2*10^-7
Burwood	Southern Metropolitan	AJP	SUS,ALP,GRN,LIB	4.8*10^-7	100	1	0.5	2.4*10^-7
Caulfield	Southern Metropolitan	AJP	LIB,ALP,GRN	4.8*10^-7	83.2	0	0.832	4.0*10^-7
Hawthorn	Southern Metropolitan	AJP	GRN,LIB,ALP,SUS	4.8*10^-7	95.6	1	0.478	2.3*10^-7
Kew	Southern Metropolitan	AJP	GRN,LIB,ALP	4.8*10^-7	76.4	0	0.764	3.6*10^-7
Prahran	Southern Metropolitan	AJP	LIB,ALP,GRN	4.8*10^-7	67	0	0.67	3.2*10^-7
Sandringham	Southern Metropolitan	AJP	LIB,IND,ALP,GRN	4.8*10^-7	83.1	1	0.416	2.0*10^-7
Geelong	Western Victoria	AJP	LIB,GRN,ALP,IND	0.1	85.5	1	0.428	0.0428
Melton	Western Victoria	AJP	IND,LIB,IND,GRN,IND,IND,ALP	0.1	81.8	4	0.164	0.0164
Ripon	Western Victoria	AJP	DHJ,DLP,LIB,ALP,SFF,GRN	0.1	89.8	3	0.225	0.0225
South Barwon	Western Victoria	AJP	IND,GRN,LIB,ALP	0.1	82.7	1	0.414	0.0414
Wendouree	Western Victoria	AJP	GRN,LIB,ALP	0.1	83.6	0	0.836	0.0836

As preference transfers from outside Western Victoria outnumbered those within Western Victoria, 50% of the weight was first divided up evenly among the Western Victoria preference transfers, with the other 50% being divided up between the preference transfers outside Western Victoria (“Region-based weighting“). Similar preference transfers were then weighted on the basis of what % of the vote transferred was the Animal Justice candidate’s first preferences (“Percent from first-preferences“) and the number of recipient parties other than the ALP/LIB/GRN (“Number of parties which are not modelled“). As all preference transfers were derived from steps 5a-5c, no preference transfer type weighting was applied. The weights from % first preferences and number of parties not modelled were then multiplied together to get the within-region weight, which was finally multiplied by the region-based weighting to get the final weight. Before being applied, the final weight was recalculated to ensure that the sum of all weights added to 1.

I find that applying these weights, as opposed to simply treating all preference transfers equally, reduces the average retrodiction error in 3cp backtests from 0.92% to 0.87%. That may sound small but in relative terms, it’s a 5.4% reduction in error size. Additionally, note that weights were unable to be applied to all transfers as sometimes you’ll only find one transfer from a party A to parties B and C, or transfers whose variables (region, percent from first-preferences etc) are all identical. Hence this reduction in error was achieved solely from the set of preference transfers where weighting was possible.

Step 7: Calculate the ratio of preferences between parties

Finally, for each preference transfer, we calculate the ratio of preferences between the recipient parties. For example, if Labor received 25%, the Liberal received 10%, and the Green received 40% of the vote in a transfer, the ratios would be Labor:Liberal 2.5:1, Labor:Green 1:1.6, and Liberal:Green 1:4.

This is basically a way of aggregating data on preference flows between preference transfers where different numbers of candidates may receive preferences. A simple average of the preference flows would be inaccurate as we’d expect each candidate to receive a smaller % of preferences in transfers where there’s eight candidates receiving preferences versus those where there’s just four.

Step 8: Construct a combined ratio of preferences between parties

For each potential ratio between any two parties (e.g. Labor/Liberal, Liberal/Green, Labor/Green), a weighted average of the ratios in the preference transfer data is constructed. For example, if we had two Labor/Liberal ratios, 2.5:1 and 3.3:1, with the associated weights of 40% and 60%, the predicted Labor/Liberal ratio of preferences would be 3:1.

Weights are only applied within ratios. So for example, let’s say we only have two preference transfers with Liberal/Green recipients, with the weights 10% and 5% respectively. In this case, the weights would be recalculated to 67% and 33% as there are no other transfers providing the relevant information.

Once every receiving party has at least one ratio connecting it with another receiving party, the model then attempts to construct a complete ratio of every party to every other party (in the example above, it attempts to construct Labor:Liberal:Green from Labor:Liberal, Liberal:Green and Labor:Green). To do so, it first tries to multiply each ratio out; so in the example above (Labor:Liberal 2.5:1, Labor:Green 1:1.6, and Liberal:Green 1:4), it might estimate Labor:Liberal:Green to be 2.5 : 1 : 4.

In some cases the solution isn’t that simple (e.g. if Liberal:Green was 1:5, that would mess up the Labor:Green ratio). In those cases, it tries multiplying each ratio out first, and then it generates a bunch of combined ratios with each number varying in units of 0.1 and up to 5 either way (sticking with the above example, it’d generate Labor 0-7.5, Liberal 0-6, and Green 0-9). It then calculates each inter-party ratio for all generated combined ratios and picks the one which minimizes the differences between the generated inter-party ratios and the actual inter-party ratios. Another set of combined ratios is generated using the previous “best” one, this time varying in units of 0.01 and up to 1.0 either way and the same procedure is used to pick the “best” combined ratio of preferences.

Step 9: Calculate preference flows, and distribute preferences

The final combined ratio of preferences between parties is then used to generate a set of preference flows. Using the example of Labor:Liberal:Green as 2.5 : 1 : 4, the preference flows would be Labor 33.3%, Liberal 13.3% and Green 53.3%. The votes of the candidate who was eliminated are then redistributed accordingly and the new totals rounded to the nearest decimal place.

In an election model, the preference flows of each simulated electorate would then have random variance generated on top of them using a Dirichlet distribution based on the variance in similar preference transfers collected above. Each electorate in each simulation would therefore use a different preference flow, even if it’s to perform the same distribution of preferences – this is to simulate the uncertainty we have surrounding what the actual preference flow will be.

This entire process – step 4 to 9 – is then repeated for each electorate until three candidates remain, at which point the three-candidate-preferred can be calculated. In an election model, steps 4-9 would be repeated until the final two candidates to estimate a 2-candidate-preferred.

(this is part of the reason why the election model often takes some time to update – imagine having to do this for every electorate which requires a distribution of preferences, and then doing it 10⁵ times. It’s in large part why I said it’d be very difficult to offer options to simulate primary vote swings as part of the Simulate-It-Yourself tool; there are some computational shortcuts I can take, but in many cases there aren’t a lot of ways around manually simulating a distribution of preferences to know who the final-two candidates are. e.g. in Waite (SA 2022), an Independent very nearly won from fourth place on 14.6% of the primary vote – you’d only know that if you tried simulating a distribution of preferences)

How does our preference distribution model perform in backtesting?

So after all that work, how accurate is our model?

To test it, we can use leave-one-out cross-validation, where we hide one electorate whose preference transfers and result we know from the model and tell it to simulate a distribution of preferences for that electorate. Here, since this model was used to estimate a three-candidate-preferred for our Victoria result explorer, I’ve told the model to calculate the three-candidate-preferred for each of the 45 districts in which a 3cp was provided.

Does it get the final set of candidates right?

One of the more important things is to get the parties/affiliations of the final three candidates right. If you know who the final candidates are (e.g. ALP/LIB/GRN), calculating a 3cp or 2cp is much easier as preference flows often don’t shift much between elections. On the other hand, relatively small errors at the early stages of a simulated preference distribution can snowball into a completely incorrect 3cp – the Labor/Liberal share of the vote would be quite different in an ALP/LIB/GRN 3cp versus an ALP/LIB/IND 3cp.

When backtesting on VIC 2018, the model gets three sets of final-three candidates wrong, and gets the final-two candidates wrong twice, or 7 inaccurate final candidate sets out of 45 tries. Of the incorrect final-3 sets, one of them is fairly innocuous (in Morwell, instead of ALP/IND/NAT, the model retrodicts ALP/IND/LIB), while in the other two, the inaccuracies would not have changed the predicted winner (in Bass, the model retrodicts ALP/LIB/IND when the result was ALP/LIB/GRN; while in Shepparton, the model retrodicts IND/LIB/ALP instead of IND/LIB/NAT).

The final-two inaccuracies are more serious as they could have led to inaccurate retrodictions as to who would win the seat. In particular, in Prahran – where the Green won – the model retrodicts a final two of LIB/ALP instead of the actual final two of LIB/GRN. The error would have mattered less in Melton, where the model retrodicted a final two of ALP/IND while the actual final two was ALP/LIB – at the end of the day however both would see a Labor victory. In both cases the actual margin between the 2nd- and 3rd-placed candidates (matchup margin) was very narrow (0.65% in Prahran, 1.86% in Melton) and if this model was applied to an election forecast, these kinds of error would be tiny enough to blend into the background of polling error and forecast uncertainty.

How accurate are the model’s retrodictions?

To calculate this, we simply look at all vote shares where the same candidate appeared in both the model’s retrodicted final three and the actual final three candidates. So e.g. in Morwell, where the model retrodicts a final three of ALP/IND/NAT while the actual result was ALP/IND/LIB, we’d compare the Labor and Independent 3cp vote shares.

Retrodicted 3cp vs actual 3cp, backtesting from the 2018 Victorian state election

Electorates where at least one party in the final three was incorrect were coloured differently, without taking ordering errors into consideration (so e.g. Prahran, where the model also said the final three would include a Labor, a Liberal and a Green candidate, was not coloured differently). Overall, there’s a pretty strong correlation between the retrodicted 3cp by the model and the actual 3cp from the election.

Plotting the full set of retrodiction errors (retrodicted 3cp – actual 3cp):

Histogram of retrodiction error at the 2018 Victorian state election. — Fitted distribution is a Gosset’s t-distribution with df = 5 and ν = 1.025.

On average, the retrodicted 3cp differed from the actual 3cp by 0.87%, with the range of errors known more commonly as the margin of error (i.e. the 95% confidence interval) being about ±3%. Additionally, there appeared to be no tendency for one party to be over- or under-estimated in the 3cp:

Histograms of retrodiction error, divided up by party, at the 2018 Victorian state election

Hence, the preference distribution model appears reasonably reliable, with error rates similar to that of a theoretically-perfect n = 1000 pollster. It’s plausible that the error rate is slightly higher on out-of-sample electorates (e.g. maybe preference flows are very different in safe seats), but it’s unlikely to be so high as to completely invalidate the estimates produced.

Go to the 3cp explorer for the 2018 Victorian state election >>

How our Model of Preference Distribution Works (and How Accurate Is it?)