Visualizing balance in the simplest way possible

WOW! NEVER WOULD HAVE GUESSED!

1 Like

Ad hominem attacks aren’t valid arguments. When you ignore an argument and attack the poster it means you don’t have a counter argument and have already lost the debate. Address the topic or don’t post.

Pointing out that you’ve never stopped thinking your race was under-powered in 3 years, is an attack? What a vicious, brutal assault you’ve had to endure! You poor baby, you.

I just imagine people like you being in an airborne unit for a day and seeing you curling up into a ball and crying because some Sergeant hurt your little feelings.

1 Like

ad hominem hŏm′ə-nĕm″, -nəm

  • adj. Attacking a person’s character or motivations rather than a position or argument

You arguments change to fit the narrative. It’s like boxing with a fart.

1 minute it’s even representations in premier tournaments. Then, the instant I prove that Zerg is growing representation at every stage of every tournament, it becomes Major and Premier Tournament wins (also, you just deleted Serral from that, because why not) based on the total number of pros. Now, it’s players skill ranked by their Mirror match ups.

Bottom line, I’ll actually discuss the possibility of Zerg be UP (now that IT got removed), but not with anyone who’s brain is able to rationalize that Zerg players get hundreds of free points of MMR, for no other reason than picking Zerg, is evidence that the thousands of them are all just magically “better players.”

People like that have zero nuance and apparently aren’t even intelligent enough to come up with a remotely rational argument to defend their biases.

1 Like

Hi BatZ!

I am from the EU server. I just want to say thank you, for all of your hard work.

Terran, specially terran early game is so overpowered vs zerg, that you have to play in disadvantage in the whole game.
Hopefully the balance team sees this as well.

3 Likes

As usual interesting thought experiments. But then we have a reality where the best Zerg on earth has the hardest time in his mirror match up of zvz. Guess zerg is overpowered :wink:

1 Like

Serral is by definition an outlier. You wouldn’t say gravity is fake because airplanes fly, would you? No? Then SC2 balance trends aren’t invalid because 1 zerg lost a ZvZ match recently. You’re talking about 1 data point compared to 50k data points. The two just aren’t in the same universe in terms of credibility.

2 Likes

My arguments haven’t changed to fit the narrative because the trend inside the data is the narrative. I don’t have control over what the data says. Several years ago I did various different analyses and they all proved to be incomplete in some way or another. The adaptation is largely in part due to forum criticism. For example, past arguments wouldn’t be able to withstand some of the criticisms that this argument has withstood here, in this thread. They’ve said that Terran players are just more skilled and that’s why they dominate, for example, but that just isn’t the case as TvT shows.

In the past I’ve looked at win-rates, representation per-round, representation at the tournament winning level and others. Win-rates are too chaotic to measure balance since there are things that affect win-rates much more than balance (skill, tournament representation, etc). Representation per-round assumes equal skill representation (which is basically never the case). Who wins the tournament is the same problem x100.

This analysis isn’t perfect, either, but it is by far the least flawed analysis of pro-level SC2 balance in existence, period. It has certain flaws like there is some lag meaning there is a delay between when the imbalance hits and when it shows up in the stats, etc. Another challenge is the kurtosis of the data, which is unusually low, and that’s a challenge in the data. But, the conclusion of this argument is solid enough the flaws aren’t large enough to have any bearing on the conclusion. The probability that the conclusion is valid is exceedingly high (we’re talking 99.99%+).

1 Like

Dude, I’ve literally seen you, in the same thread, whine that TY won a game against someone with higher MMR, the cry that all Terrans have more MMR because of how imba they are.

1 Like

I’ll show what mister Batz gets wrong and what he gets right in this coment, but first of all:

This sentence should be an indicative of Batz inteligence.

And now what he gets right: Yes, in a perfect balanced game (be simetrical or asymmetrical), mirrors matchups can be interpreted as indicative of skill on average. What this means is that even if individual players are confortable with some matchup, a large enough sample should even out equaly all the 3 matchups per race. This means that it’s not parameter of skills for individuals but it is for entire races, IN A PERFECT BALANCED GAME.

Now, what he gets wrong: Starcraft 2 is NOT a perfect balanced game, why is this a problem? SC2 is an asymmetrical balance, so PvP, ZvZ and TvT there are different levels of volatility. What I mean by that is that PvP is realy volatile while ZvZ is not (we are talking about high level here, ladder doesn’t count). That means that Showtime can beat Zest, but Lambo cannot beat Dark. That would drag down a lot the top Protosses PvP elo and their “parameter of skill” then of course the PvZ and PvT elo would be higher. There is a flaw that mister Batz is chosing to ignore. Now assuming that Protoss level of skill is in fact higher than their mirror, guess what would happen if Blizzard start making patches to drag down PvT and PvZ to their mirror level.

The argument of the practice difference for season, players changes their practice for each championship depending of how much more Zerg/Protoss/Terran the championship are do play a factor, but imo I think it’s a minor one. If anything claims to be a in depth study of balance, it should be ruled this factor out or proven it right but that is asking for too much so we are going with that.

Now, the big wrong in all that attempt of using winrates and even Elo can’t solve balance problems is that Starcraft 2 is a asymmetricaly balanced game. Which means that each race has a different number of options of how they wanna play the game, taking away one option will force the meta to shift to another option. Let’s say that in PvZ the Protoss have 3 options to play it, you nerf and remove one option, they will jump to another options. Today Protoss open Adepts out of necessity and they get away with close to the same/higher/lower winrate, that will shake PvZ curve and normilize overtime, that’s doesn’t mean by the slightest that the matchup is balanced.

You’re also the guy who utterly refuses to focus on the topic and instead does relentless personal attacks. In other words, your claims have basically zero credibility. The fact that you ignore the argument alone is proof you have no rebuttal.

1 Like

Your argument is basically the same as Cheezecake’s and it is in essense a rejection of the whole field of statistics. You are saying that matchups aren’t comparable because they are fundamentally different. What you fail to realize is: A) the data of the mirror matchups is virtually identical so you are wrong about them being different, and B) that the point of statistics is to compare things that are fundamentally different. THE point is to look for differences! How do we find differences if things that are different can’t be compared!

By your reasoning, we can’t test drug A and drug B to see which is better at treating heartburn; in the same sense that we can’t test whether race A or race B is better at winning. We also can’t tell if larger steel beams are strong than smaller ones. Nor can we tell if aluminum is weaker than steel. I mean, they’re fundamentally different metals, right? :laughing:

Statistics is a cornerstone of most applied maths and sciences, which means you are rejecting not just statistics, but mathematics and science as well. Your argument is basically on par with 5th century sophism. You know, the way people used to think a thousand years ago.

Elo was designed specifically to measure skill in zero sum games. That’s all that’s required for it to work. Starcraft 2 is a zero sum game and elo is extremely accurate at modelling it. The SC2 ladder uses a variation of Elo. The AlphaStar team used elo. Chess uses elo. Notice a pattern? It’s good at modelling zero-sum games, period.

Last night I missed your last sentence which is a demonstrably false claim. The CLT states that with an adequate size, a sample will converge on normality and thus a normal distribution model is appropriate:

In probability theory, the central limit theorem (CLT) establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a bell curve) even if the original variables themselves are not normally distributed.

You can also do various tests to see if the data is normally distributed, which I did and showed you and thus proved that it is. The correct model is being used and to state otherwise is factually incorrect.

You can plainly see the data is normally distributed in this image which compares the actual data to the corresponding normal distribution:

https://i.imgur.com/BfYgDVk.png

I’m denying your invented statistics.

No, it’s not. They are different.

I’ll jump to this because you posted a bunch of jack that I’ll not adress.
Now, you said: “Elo was designed specifically to measure skill in zero sum games.”
So, let me highlight for you: “Elo was designed specifically to measure skill in zero sum games.”

There are two problems there, first that you haven’t plotted all players whitin the SC2 Elo system so your graphs will not show zero sum. And seccond that Elo can only do so much for SC2 because it’s not as balanced as Chess, in SC2 you pick a race and go for it, in Chess you will play the same ammount of white and black so it’s all even out on skill and balance doesn’t play any role in it. So Elo is dependant on balance, not a parameter for balance, it wasn’t even designed to be. It has nothing to do with balance, that’s why skill is highlighted and that’s even why there are different Elo for standard Chess, blitz, rapid etc…

Yeah. I used to argue with you on an intellectual level. The instant you got proven wrong you called me a “bigot,” and moved your goalposts. You have a narrative in mind, and no amount of facts will change your truth.

2 Likes

There is nothing invented here.

https://i.imgur.com/BfYgDVk.png

Circle the difference.

Are you saying I should dilute the data with players who have only played a few games? That’s literally opposite of what you want to do. You want to ensure the Elo rankings are accurate (that they match the player’s skill) and that requires certain conditions are met such as having played enough games.

*In asymmetrical matchups, not the symmetrical ones. The symmetry of SC2 mirror matchups is actually greater than the symmetry of chess. We can measure how much a player’s rank changes in an asymmetrical match relative to their rank in symmetrical ones, thus measuring how the asymmetry affects performance. We must do this as an average of many players, since there will be an inherent variability.

If asymmetry does not affect performance, players should have (on average) the same rank in asymmetrical matchups as they do in symmetrical ones. That isn’t the case. Terran does worse in TvP but better in TvZ. Protoss does better in PvT and PvZ. Zerg does worse in ZvT and ZvP.

The only thing that is altered between these groups is the symmetry, so the performance change within those groups must be a product of the asymmetry of the matchups (aka they are imbalanced).

I don’t always agree with Batz on stuff, but in this one I really believe it is a perfectly fine approach. Granted my statistical knowledge is more applied to the field of point of failure prediction but most of the things he does that people have argued against are cases of simplification based on very reasonable assumptions. Cutting players off the top ranks might not result in an exactly zero sum situation, but a stronger sample (whatever the word is for a less diluted sample, english is not my main language) where players whose skill varies extremely from week to week depending on if they play 20 games a day in that week and then none for two months or whatever, and faster working time are much more beneficial as upsides compared to the downside of some loss in accuracy from the model not being exactly zero sum.

1 Like

Well, idk what this graph is but I see PvP right at the bottom of ZvZ and TvT. Even Harstem did a topic showing how inconsistent PvP is and brought average PvP Elos are the lowest of them all, not gonna search for it. And the fact PvP is inconsistent is a well known fact, even Blizzard adressed it last patch. So you are wrong and your own graph kind of shows it.

Yes, and you are comparing 3 differently symmetrical games with 2 other differently asymmetrical each.

No you can’t, that’s you inventing math.

We could think about it in a balanced game. But if you claim that the game isn’t balanced then your math is wrong.

I agree and not even making that a big deal. What he is not accepting is that he can’t compare PvP, ZvZ and TvT as being the same and call it parameter of skill since PvP is not as reliable as ZvZ. And this is his premise, so all that comes after this is cannot be proved by this.

The problem of all of this has the root cause in that everyone takes skill to be a set of different parameters and it is almost impossible to reach a consensus. I do believe that the most unbiased measure of the skill of one set of players can be normalized through the mirror matchups though. Let me explain the thought process behind what I (yes, this is how I personally would measure skill and not claiming it is the perfect answer) would do.

1-In mirror matchups, balance is not an issue in the slightest.
2-We can assume that, being whatever coinflip dumpsterfire knife fight in a dark cupboard a matchup is (looking at you PvP), through a large number of games a more skilled player will defeat other lesser skilled players more times than they defeat him, the more the skill difference, the higher percentage of the more skilled player winning is. This is pretty well represented, at least in mirror matchups, by rank in that matchup.
3-With the exception of the very very top, we can assume that there are enough players playing starcraft such that the skill distribution of the three races behaves similarly (players of any skill level having equal odds of choosing any particular race). Here we run into the first problem, which is that Terran is the race most picked by beginners, so some kind of filter like “At least X games played” where X is, for example, 300. Whatever, some kind of way of filtering that.
4-So, with these premises and assumptions, the only thing left would be to analyze the distribution of those players depending on their rank for each race and sandarize. If they were to follow say a normal distribution for example, then subtract mean rank and divide by standard deviation. Then you could directly use these variables to place players on the same distribution curve regardless of their race as long as you take their rank, subtract the mean for their race rank and then divide by hte standard deviation of their race rank (assuming normal distribution for the sake of an example). You could then compare which of any two selected players is “more skilled” than the other regardless of their race selection. Say you have a protoss with standarized rank of 1.2 against a zerg of standarized rank 1.3, you’d expet the zerg player to be a more skilled player, even if their rank was 1140 and 800 respectively in their mirror matchups. The “reliability” of each mirror matchup in having the better player win would, in this case, be taken care of by the parameter of standard deviation, a more volatile matchup would simply have a lower standard deviation in this example, in where the situation of a very volatile matchup where who would win is extremely random compared to skill deciding, then players would be more clumped up (because luck balances itself out over a lot of games) and then the standard deviation would be smaller.

I don’t know, its sleeping time for me, my point is that I do think that you can use rank in mirror matchups as an overall indicator of skill. The problem is that we now run into different races requiring different skills and the such but… Assumptions always have to be made.

1 Like