This is what I was talking about with people not knowing the basics of how to approach these issues. Introductory statistics:
In introductory statistics, the null hypothesis (often denoted as H₀) is a statement of no effect or no change. It typically represents the status quo or the widely accepted belief, which a researcher then attempts to disprove with data. Essentially, it’s the hypothesis that there’s no relationship between the variables being studied
The default assumption is that there is no association between Protoss and other variables. We use a hypothesis test to see if there are relationships between Protoss and other variables, like skill & performance:
In introductory statistics, a hypothesis test is a procedure used to determine if there’s enough evidence to reject a claim (null hypothesis) about a population parameter. It involves collecting data, analyzing it, and making a decision based on the results. The process helps researchers understand if observed differences in a sample are likely due to chance or if they suggest a real effect
In the case of Grandmaster, the hypothesis is that Protoss have higher performance that makes it more likely to get Grandmaster. We assume this is not true, that Protoss has equal odds to be GM as Terran and Zerg, and calculate the probability it can happen under this assumption, aka the H₀ hypothesis, using a binomial calculator:
https://i.imgur.com/gPbTMWk.png
There is a 99.999% chance that Protoss and increased performance are related. In science, 1 in 20 odds is considered significant. This is 1 in 171,824 odds. It is the definition of a sound statistical conclusion.
Now, what is the mechanism. Typical responses:
- “What if all the smurfs play Protoss?” → violates the H₀ hypothesis because you’re assuming there is a relation between Protoss and odds to smurf, even though the odds of that being true are 0.0006%.
- “What if the Protoss are more skilled?” → violates the H₀ hypothesis because you’re assuming there is a relation between skill and race selection, even though the odds of that being true are 0.0006%.
This alone is very strong evidence that Protoss, and Protoss alone, is the cause of increased performance, but we can do a hypothesis test that Protoss are more skilled. This one is a little more complicated. Instead of a binomial probability, we need a z-test:
- The R2 value between MMR and APM is 0.42, which means 42% of the MMR can be explained with APM. That leaves 58% of MMR unexplained. This tells us how much APM varies from performance on an individual basis. We want to know how many APM measurements have to be average together in order to shrink the 58% unexplained factors to 0%, using the central limit theorem, which is equivalent to saying the average APM perfectly predicts the average skill level of those in the group.
- The sampling variance formula is then s^2 = ∑(Xi - x̄) / (n - 1). This tells us the variance shrinks by n - 1. That means we can create an equation to calculate n:
s2 = 0.58/(n-1)
(we want a small s2 value, which I choose as 0.001)
0.001 = 0.58/(n - 1)
(solve for n)
0.001*(n - 1) = 0.58
n - 1 = 0.58 / 0.001
n = 0.58 / 0.001 + 1
n = 581
- This calculation shows that if we have a sample of >=581 players, the variability of non-APM factors will shrink to s2<=0.001, which is equivalent to saying they don’t impact the measurement.
- Grandmaster has 600 players. What is the average APM for each race in Grandmaster? The average for the past year was:
Race |
APM Avg |
T |
258 |
P |
241 |
Z |
322 |
Source: |
https://i.imgur.com/SEl0awm.png |
- Since n=~600, these APM values perfectly correlate with performance. This definitely proves Protoss in GM are substantially worse than their Terran and Protoss counter parts.
Applying the Bradford-Hill criteria.
The Bradford-Hill criteria has some tools for double-checking that a relationship isn’t erroneous. Lets go through them:
- Strength of Association – A strong association between a factor and an outcome makes causality more likely. The correlation in this sample between MMR and APM is very close to 1. The association is strong.

- Consistency – If different studies consistently show the same association, it strengthens the causal argument. Studies that look into this relationship have similar findings. For example:
https://www.researchgate.net/publication/380467385_Starcraft_2_Performance_An_In-Depth_Look_At_In-Game_Telemetry_And_Player_Rank
. These findings are consistent with other research. 
- Specificity – If a specific cause leads to a specific effect, it supports causality. How fast you play has a very obvious positive impact on performance in real-time games.

- Temporality – The cause must occur before the effect (i.e., exposure must precede disease). How fast you play in a game obviously precedes the outcome of the game.

- Biological Gradient (Dose-Response Relationship) – Higher exposure levels should generally result in a stronger effect. The correlation of 0.65 shows an obvious linear relationship between APM and MMR.

- Plausibility – The relationship should make biological sense based on known mechanisms. It is very plausible that speed affects performance in real-time games.

- Coherence – The association should not contradict existing knowledge of the disease or condition. Speed impacting performance does not contradict existing knowledge of how RTS games operate. Industry experts universally agree that speed is one of the biggest factors.

- Experiment – If experimental evidence (such as clinical trials) supports the association, it strengthens the case for causality. Low-APM challenges on YouTube show 6.5k mmr pro players struggling to get 5k mmr.

- Analogy – If similar factors are known to cause similar effects, it lends support to causation. We know that in chess having less time to think about your moves reduces the quality if the moves, such as in Blitz chess.

This meets the Bradford-Hill criteria, which means there is definitely a negative relationship between skill and Protoss within the Grandmaster sample, meaning Grandmaster Protoss are reliably less skilled on average than Terrans or Zergs.
We have a positive correlation that shows Protoss have increased performance. We have a negative correlation showing Protoss have decreased skill. What hypothesis could possibly explain this other than that Protoss is imbalanced. Protoss is definitely imbalanced and the odds that this conclusion is wrong is <0.0006%. We can see with certainty that Protoss is overpowered.
The balance counsel, by contrast, is convinced that Protoss is underpowered. These people are utterly clueless and severely incompetent. They shouldn’t be allowed to manage a mcdonalds, let alone the design of a billion-dollar video game.
By the way, this analysis underestimates the confidence of these conclusions because we are measuring Grandmaster for a single day. The reality is that Protoss have dominated Grandmaster for years, and they’ve had reliably lower APM the entire time. If you calculated the true confidence, it would be somewhere in the ballpark of 99.9999999999999999999999999% probability that toss is busted. Add in how they dominate ESL cups, have had positive win-rates in the pro scene as recorded by Aligulac, etc, and the amount of evidence is truly insurmountable. It’s impossible to get this answer wrong, and the balance counsel managed to do it. Utterly incompetent doesn’t even begin to describe these people.
To solve this, the balance counsel members should be required to put $20,000 USD in escrow which they will not get back if they fail to balance the game. This solves the issue in two ways: one, it filters out dummies whose finances are a mess. Two, it provides a very strong incentive to not fail. If you aren’t willing to do this, then you shouldn’t be allowed to work on a billion dollar game’s design to affect million dollar tournaments.
Why on Earth these kinds of protections & incentives don’t exist blows my mind. Who is in charge of this sinking ship? We’re just going to let some randos who play video games all day long do whatever they want even when it obviously causes clear and cognizable harm to some of the players? 
There are more ways to measure this, by the way. We can compare how PvZ performance compares to PvP performance: https://imgur.com/R467oTT
Protoss with X PvP performance score X+30 in PvZ; zergs with X ZvZ performance score X-30 ZvP performance. Translation, Protoss universally have higher performance in PvZ than PvP, and the difference is equivalent to a 60% win-rate advantage.
You might say that 60% is pretty small. But, it drastically hurts Zerg players’ ability to win tournament money. Let’s say a zerg plays a single-elimination tournament vs 32 other players, and that all players are of equal skill. The odds that you win is 0.5^5=3%. If you reduce his winrate by 10%, his odds to win goes to: 0.4^5=1%. It slashes an individual Zerg’s ability to win tournaments by a factor of 3x. If this is true, Zergs should win the least tournament money (subtracting outliers):
https://imgur.com/ft3X26V
Expectedly, Protoss tournament winnings, after subtracting outliers, perfectly mirrors grandmaster representation. We’ve unified Grandmaster and Pro play under the same umbrella, we’ve shown in Protoss’ advantage can be measured in multiple ways. We’ve shown skill reinforces the theory that Protoss is advantaged because Protoss are less skilled according to skill metric measurements. What more do we need? The level of delusion surrounding SC2’s balance is equivalent to having your credit cards sent to collections BUT still believing you are a millionaire. The belief that Protoss is underpowered is absolutely, unequivocally, delusional.