The effect of skill, quantified. DR278

I’m not measuring skill. I’m measuring the effect of skill on winrate. It’s the difference between measuring gallons of gasoline and measuring the miles per gallon of a vehicle.

That top 1000 Legend players, on average, have more skill than Diamond 4-1 players, on average, is self-evident. But there is no serious attempt to measure this skill gap. It’s a little bit like the AU in astronomy. What’s the distance between the earth and the sun? Well, whatever it is, we’ll call that distance an Astronomical Unit and therefore the distance is 1 AU. Is “1 AU” a measurement? I’d say no, because we substituted definition for measurement. Then the winrate differences presented in the final chart are in units of winrate per “skill unit” which I don’t consider to be me measuring skill either.

In short, I’m not measuring skill and I’m not pretending to measure skill. What I am measuring, and accurately, is the effect of skill upon winrate. That’s why the title of the thread is “the effect of skill quantified” and not “skill quantified.”

The rest of your post is just an attempt to avoid measuring anything, because facts are your enemy.

You might argue that my priorities are messed up, but I’ve been more concerned with things like people saying that I’m measuring skill when I’m not. That adds extra words.

Since I’m perhaps a bit too scared of this, how about YOU write the summary for normal people? I’d appreciate it and I honestly think you could do a better job than I could.

3 Likes

No, they are firmly on my side. but you lack the knowledge to know why.

You are not measuring the effect of skill.

Nevermind that many of your “effects” are within your own describe rounding error, you don’t even know if your observations are independent.

You can’t handwave this away:

And you do nothing to address this. Nothing.

This whole thing is you making up numbers and lying to everyone about your own knowledge.

This doesn’t quantify skill. At all.

But knowing how this data is collected, it could also mean the best players don’t play hunter when they climb to the top of the meta because the playrate dips significantly.

It’s completely a chicken egg problem and this analysis does NOTHING to rule this out.

What is self evident is that fewer players choose to play hunter at top 1k and nothing about VS data answers why that change occurs. Any attempt to call that change “skill” or even “quatified skill” is bull…

1 Like

It’s a small part of the data, but top 1k players that are still using hunter are still going to be among the best hunter players available in the game.

It’s not going to be enough of an impact to fully skew the findings, because the deck still exists up there. People of the highest skill are still playing it.

Players abandoning a deck up there can sometimes make the matchup data too low to do much with statistically though. That’s why it’s easier to just drop the decks that aren’t seriously played in the top 1k from the analysis.

Good players abandoning a deck can magnify the apparent weakness of the deck a bit, but it generally doesn’t cause the drop off in performance. People leave the deck when it stops reliably beating a certain matchup spread, which makes the skill pool of the deck worse which loops back into the data.

If the matchup spread doesn’t get worse, there generally isn’t an abandonment of the deck in the first place.

2 Likes

I don’t think that’s true at all, he just needs to calculate the error margin.

1 Like

That data in particular is not going to have ANY impact on the findings.

As I explained before, the methodology is:
P(D)×M(D) → P(L)×M(D) → P(L)×M(L)
So the middle meta is a hypothetical made out of Diamond matchups and Legend popularity, the first arrow isolates meta and the second arrow isolates matchups (skill). The first meta is just Diamond winrates and the final is just Legend winrates.

Top 1000 Legend matchup data was never collected because it’s unnecessary. It’s implied in the top 1000 Legend overall winrates. The data collected is exclusively for the purposes of constructing the hypothetical middle meta.

Using a method that was dependent upon top 1000 legend matchup data was specifically avoided because of the scarcity of such data. We do not need to use it so we do not.

1 Like

Thanks for the reply. Now I see what you’ve done: you applied the Diamond deck match-up win rate onto the deck distribution in legendary.

Now can you explain your Part 3, what’s the exact calculation of the “skill diff” and “% skill”?

1 Like

“skill diff” is a nickname for the difference in win rate between the hypothetical meta with diamond matchups and top 1000 legend popularity, and the Vicious Syndicate top 1000 legend winrate, for the same deck.

I didn’t display it but “meta diff” would be the same thing, but for the difference between the Vicious Syndicate Diamond 4-1 winrate and the hypothetical meta winrate.

“Skill %x” is the absolute value of “skill diff” (so “skill diff” without any negative sign if it has one) divided by the sum of the absolute values of “skill diff” and “meta diff”. So if “skill diff” is 3% and “meta diff” is -1% then “skill %” would be 3/(3+1) or 75%

1 Like

I mean, it’s in there as part of the aggregate win rates you are pulling from top 1k to compare the delta, even if you aren’t directly using it in your in between calculation, which is a good way to show how the meta is/is not explaining win rate differences.

Because if it’s not the meta, really that just leaves skill as the difference.

1 Like

I couldn’t really write a summary honestly without looking at the data and doing all the work myself.

However, my take on what constitutes skill will not jive with what people want to hear, despite me giving a fairly good logical observation on what defines skill in this game.

For me, logically, the more choices and the more options that you are presented with, the more likely you are to make a mistake. The more chances you are given to make a mistake, the more skill is involved overall.

For example, if I can win with a deck by correctly choosing between choices A and B 3 times in a row, you could say I have a 12.5% chance of piloting the deck correctly (50% * 50% * 50%).

If 1 deck gives me 20 choices throughout a game and another deck gives me 200 choices throughout the game, it logically follows that the deck that requires me to make 200 choices is harder to pilot correctly.

Thus, bots pilot decks that require the fewer amount of choices (aggro decks). You don’t see bots piloting a control deck ever because the programming is much more intense.

So in essence, the more programming required to pilot a deck, the hard the deck is to pilot, and thus, the more skill it takes to pilot correctly. With the few “data to show skill” threads we’ve had, we’ve always seen this to generally be the case.

But people don’t want to hear this. So putting forth the effort just to show the data seems meaningless for me to do because 1) People will automatically assume I’m bias because I play control and 2) Even if the data showed I was correct, they will point to point 1 and dismiss everything.

So, kudos to you for trying, but as you can see already, people won’t buy it.

The only thing people want to see is this:

“The deck you are currently using and you love takes the most skill to pilot. Good job, you!”

3 Likes

This is the statistical equivalent of “trust me, bro” and it’s hilarious.

I’d be much more interested in knowing there wasn’t overlap across the groups.

If one player needs 100 games of hunter to climb to legend from D5 like his other chart, two top 1k players playing different decks in the each group really messes up the numbers.

And that’s ignoring the fact that all of these numbers are interdependent… that is the popularity creates the winrate, so the hypothetical meta is just gobbledygook not “skill”

It’s literally made up numbers that they are sticking a meaning to that isn’t based on anything sound.

I can concede that people at the top 1k are better at the game. That’s self evident.

But win rates in aggregate can’t be separated from playrates in aggregate because they 100% vary together in aggregate due to the RPS nature of this game.

It’s even more complicated when you look at how VS calculates wins using opponents and then smooths that artificially… which, incidentally, is why the numbers don’t match the raw data as noted in the op… and you understand that win rates must drop towards 50-50 (regression towards the mean) as opponent skill improves because it’s the first time that the more skilled pc players who opted in to VS data are playing players closer to their ability.

Trying to quantify the amount of regression towards the mean and calling it skill is pretty funny, and that’s what we have in this analysis.

1 Like

They are still good enough to be in the top 1000 playing the deck, so uhh… I don’t know where you think this massive skill drain is coming from.

It’s weirder that you assert it’s a huge effect.

The effect is largest on the decks that get fully abandoned at top legend, not the ones that are still around but are winning a bit less than they do at lower brackets.

Some decks just get really bad as you climb, and are better noob stompers than anything else. Pirate warrior was a great example of that. Oppressive until people figured out how to beat it, and it wasn’t even a playable deck in high legend most of the time.

1 Like

Then you should have no problem with the findings of the opening post.

I agree. And this is not part of my methodology.

You do realize that there are multiple aggregates, right? My post uses three — two actual and one artificially constructed — but Diamond winrates are associated with Diamond playrates and Legend winrates are associated with Legend playrates. What I’m primarily measuring here is the differences between the winrates of different aggregates. We’re not separating winrates from aggregates, we’re controlling the conditions of an aggregate midway between the two real aggregates to control for one variable or another.

This is kind of like saying that spaceships can’t get off of planet Earth because gravity. I absolutely agree that matchmaking by skill does tend to drag winrates towards 50% the more matches that players play, but there’s an important distinction in meaning between “tend to” and “must.”

If you look at the results, the effect of increasing skill (for both players) on decks is generally negative, not positive. That actually supports the “tends to” version of your regression to the mean theory. But this is not the case for EVERY deck because it’s not a “must.”

  1. This is essentially an idea on how to measure skill, which for like the fourth time is NOT what I was doing in the OP. I was measuring winrate change per unit skill, where the unit is defined away as “the difference in skill between the average Diamond 4-1 and the average Top 1000 Legend player.” Like how the AU is just defined as the (average) distance from the Earth to its Sun.

  2. I consider this to be a good rule of thumb but not a hard rule. For example, a single test that 20% of people fail seems to be slightly more skill testing to me than two tests that 10% or people fail, “in series” (so you have to pass both to pass overall). Only 80% of people pass the first setup, where 81% pass the second setup. It should be obvious that more skill tests of the same challenge level is more skill testing than less challenges of the same skill level, but not all tests are of equal challenge level so it gets a bit more complicated than your rule of thumb.

So basically quality^quantity, if we express it mathematically. And your intuition is essentially that .5^10 is a lot less (harder) than .1^2 so it’s usually correct

1 Like

This comment says you have zero understanding of what I’m talking about and that’s a you issue at this point.

Take a stats class or ten. Learn something.

But for starters… there’s only 366 game in D4-1 in the rogue category for the calculations. By their own numbers:

That
366 could be the play of three total players and just three players. That’s the point about sample size can be a HUGE factor.

It totally is, and you don’t even understand how. That’s the point.

No, they must because as the pool of total players dwindles, the likelihood that you’re actually recording both sides of the same game in your data increases. How many top 1k players do you legitimately think don’t use trackers? More than half? Less?

But even beyond that, it’s not a case for every deck because the meta changes.

Winrates are a snap shot of a specific set of parameters and are only relevant to the sample you took them from. They don’t generalize and you know this because you frequently remind people how bad the bronze data from HSReplay is everywhere that isn’t bronze.

You’re trying to generalize them to a different population, and it doesn’t work with any real validity because it’s just you playing with numbers.

This was your premise.

And that’s just not the case and can’t be determined from your numbers.

2 Likes

So what you’re telling me is that HSR or a similar site will be getting data from two players at the same time, where they both say that the game started at the same time (plus or minus a few milliseconds), with opponent names that match the other simultaneous reporter, saying that the exact same cards are mulliganed and played, the same minions attacking and damaged and destroyed, finally the game ends at the same time, and yet somehow the HSR people are going to be like “DURR HURR DIFFERENT GAMES RECORD IT TWICE”

I mean, I don’t want to say it’s impossible because I never like to completely rule out extreme human stupidity without evidence to disprove it, but let’s just say that this is an example of a completely preventable error that doesn’t have to happen. “It MUST happen” is hard debunked.

Again, no I’m not. I am measuring the distance between populations. The subject of the opening post is NOT Diamond 4-1. The subject of the opening post is NOT top 1000 Legend. The subject of the opening post is NOT skill as some broad generality — at least, not past Part 1, which has one paragraph on it. The subject of the opening post is differences between Diamond 4-1 and top 1000 Legend and the ratio between those differences.

Hey it’s the key sentence from that one paragraph.

Well how do you explain it then? You have Diamond players play a bunch of games of specifically Mech Rogue versus specifically Enrage Warrior, and a bunch of Legend players doing the same thing but separately. For the sake of argument let’s say that the decklists are exactly the same, card for card. When the games are all done the matchup winrates for Diamond and Legend don’t match. What’s causing that?

You yourself agreed that the average skill level of Legend players is higher than the average skill level of Diamond players. We’ve controlled for deck selection entirely, there is zero meta influence here. So what is it if it isn’t skill?

Whatever you choose to call it, one thing is for certain: that difference is empirically measurable.

Your assertion is the difference is solely skill. That’s what you said.

I agree you can measure the difference in the win rates of that match up between the two brackets… on the face of it just by looking at them and doing subtraction, there it is.

The disagreement is that the difference is 100% skill.

This is, definitionally, an assumption. In order for someone to accept your conclusion they would have to agree to your assumptions.

I know that you want to make it for the sake of argument, but the fact that you have to put it out there as an assumption concedes that something other than skill is likely a not insignificant factor - changes in deck lists for specific metas. If even 1% of the difference is deck changes, the 100% skill argument is invalidated.

But let’s go back to your sample size for just a second. What if several players in D4-1 played mech rogue at a 75% win rate against enrage warriors as they cruised to legend, then swapped into rainbow mage to just play around and enjoy the challenge in the top 1k legend meta. How does that influence the numbers we have?

Another set of assumption are things like outliers and influence. We don’t know enough about the sample to understand what causes the fluctuations.

You desperately want skill to be the answer, but you haven’t done enough (and honestly can’t) to rule out other factors that influence weighted averages.*

(*If you were really going to go about this, you would need to have more information to understand if you data was normal -measures of central tendency, standard deviations, skew, kurtosis, etc. - because any number of factors could make an average look different. )

It depends. To what extent is your card choice when “tweaking” a netdeck an expression of skill? A difference in skill could be the reason why you have one card in your Enrage Warrior that another player doesn’t. Are you implying that Legend players don’t, in general, run more refined decklists than Diamond players?

So is your answer to my question

that it’s basically random statistical noise?

If I do this again next VS report and get very similar results, wouldn’t that be good evidence that there isn’t much noise there? I mean, I don’t think that the skill gap between Diamond and Legend players is perfectly static, perhaps especially not when going from the end of a month to the beginning of a new one… but I don’t imagine it changes radically.

But I probably should concede to you that, yes, acting like there’s 0% noise to signal was probably an extreme claim. An exaggeration. But I would still contend that the vast majority of the effect is caused by skill. Something like 95%.

I mean, you are basically ignoring the idea that there could be a greater number of skilled players playing mech rogue in D4-1 and that artificially inflates the win rates there, which is exactly opposite your conclusion.

Again, your data is insufficient to make your conclusion.

1 Like

Greater compared to what? Top 1000 Legend? And do you mean a greater percentage or an actual greater number? If they’re so skilled, then why are they still on D4-1 at the end of the month? Is it because they don’t play much? Because we’re not even looking at things as a percentage of players but as a percentage of GAMES, so a great player who plays less doesn’t count as much as a player who plays more.

I am not quite ignoring this, but it would be totally fair to say I’m dismissing it, and you should too. On average games played by Mech Rogue players at Diamond have a less skilled pilot than games played by Mech Rogue players at Legend. I don’t have or need data for that (unless you count the fact that Legend players just flat out have higher monthly winrate, or they’d still be Diamond players), that’s just the way it is.

It doesn’t need to be 100% skill. It’s still almost certainly primarily skill differences.

People don’t typically arbitrarily change decks. Human psych tends to stick with things that are working for them. It usually requires the deck to stop being effective at continuing to climb before someone abandons it.

It’s not wrong to say that X deck is getting worse because of people playing better. Deck changes may magnify this a bit, but it doesn’t make the statement wrong.

I’m curious so please indulge me if you CBF :slightly_smiling_face: