Toss hits 43% in GM

WARNING: Giant technical post incoming, I don’t blame you if you don’t read it lmao.

The chart does have a lot of missing info. It’s using Elo to rate any player in Aligulac w/ >100 games in each match & who have played 1 game since January 1st, 2023. It then converts the Elo scores to a bell curve. This is useful because we can compare how the group behaves as a whole unit. For example, mirror matchups have no balance issues. Each player has access to the same exact tools. It’s symmetrical. The only variables affecting the outcome are skill and luck. This can be used as a proxy for true skill IF you assume the PvP pool and ZvZ pool and TvT pool have roughly the same parameters. There are tests that we can do to verify this, but let’s assume it’s approximately true. We have a way to define a player’s skill via their mirror matchup’s elo ranking.

We can can now compare their performance in non mirror matchups to what we expect based on their mirror, and we can see if there is a trend. For example maybe toss is imbalanced but it requires super high skill to take advantage of it. If that’s true, high skill toss with a high-ranking PvP will have an unusually high PvZ or PvT by comparison to other protoss. That’s something we can test here. If we graph the player’s PvP on the X, and on the Y we plot their PvZ minus their PvP, we would expect to see a horizontal line if there is no relationship between. If we see a positive slope, it means high skill players are outperforming their mirror matchup. If we see a negative slope, it means high skill players are underperforming relative to their mirror. Interesting, it has a negative slope, meaning high skill players tend to be best in their mirror matchup & underperform in other matchups compared to their mirror.

What’s shocking though is when you compare this relationship between matchups. If balance trends correspond with skill level, we would see a negative slope in ZvP and a positive slope in PvZ, meaning the performance of high skill toss goes up in PvZ and the performance of high skill zergs goes down in ZvP. That would mean not only a balance issue, but a balance issue that is only a problem for the highest tier players. But, that’s not what we see. All the matchups have virtually identical lines. The idea that balance shifts depending on skill level is simply wrong according to this test.

That brings us to the next point. I’ve said it a million times on the forums but if something is caused by protoss it must affect all those who play protoss. What we’d expect to see is the whole protoss group move up in performance if there is a balance issue. That scenario would not be visible in the above test. That test is designed to see if there is a relationship between balance & skill level. If protoss as a group all move up, there will be nothing visible on that test.

On this new test, we will use linear regression to find a line that represents the relationship between PvP and PvZ.

There are two parameters to your typical line. One is the slope and one is the x intercept. The slope tells us something similar to the previous test. We’re going to ignore it. The x intercept tells us the average performance of the group. A positive means over-performance, a negative means under performance. Here is that chart: https://i.imgur.com/Oa6EAON.png. Protoss have a mean of 39.3 & zerg has a mean of -17.6. Toss perform 40 elo higher in PvZ than their PvP would expect and zergs perform -18 elo lower in ZvP than their ZvZ would predict. So we have a difference of 57 elo. That comes out to a 58% win-rate advantage for protoss in PvZ. That factors skill out of the equation if you assume the pool of PvP players is roughly the same in skill as the pool of ZvZ players (there are reasons to doubt that that is the case, aka apm/spm/supply block time/how many resources they float, etc, but that’s a story for another post).

Another point worth making is that this is the average performance of the entire pro scene as a group. The problem is that Elo tends to lag. If players don’t play games, their rankings don’t update. In a theoretical scenario let’s say we buffed Protoss to 90% win-rates. It would take time for the whole population to reflect that because they have to play games, win or lose, and their rankings have to update. If every player plays 1,000 games, it will update quickly, but if every player plays 0 games it will never update. This data is probably somewhere in between. This isn’t the final win-rate, it’s a snapshot of the current win-rate average as it moves towards the final win-rate. It’s like inertia in physics. A heavy ball is harder to accelerate than a small ball. The weight of the ball is how frequently people play games.

In order to estimate the final win-rate, we have to measure the change of the win-rate over time. That becomes more complicated. The take away is that the group has a resistance to change and that means we are probably under estimating the win-rate difference.

We can solve this problem numerically. We simulate players playing games until we get data looking like the pro scene. Then we hit PvZ with a massive protoss buff by setting the PvZ win-rates to 90%. We track how many games it takes for the rankings to update totally. We do this a bunch of times with a bunch of random values for who is favored, how many games are played per day, and we fit a polynomial equation to it. We should get an equation with a very tight fit. We find where on that equation we are for the current scenario and it outputs the final win-rate & amount of time to get there. It’s a big task, it’s not worth it, PvZ is busted and there’s plenty of evidence to show it.

This definitely shows toss is busted, but it doesn’t show a balance perimeter. To me, a perimeter means “within this group, we have unusually high performance that is not seen elsewhere”. The test I used doesn’t indicate that balance changes depending on certain skill clusters. It indicates protoss as a group all benefit from the same amount of imbalance. The imbalance applies equally in other words.

This test has only been done on the pro scene. Maybe balance does vary on the ladder. The ladder has a broader spectrum of skill levels. It might. I haven’t tested it.