The effect of skill, quantified. DR278

Okay, so part of the disconnect here is I am coming at it from an angle you don’t understand. I’ll try to give the short version as best as possible, but it’s not an exhaustive explanation.

What you seem to not understand is, and forgive the comparison, but I hope it helps make the point, in this discussion where you have asserted that you are showing a quantification of “skill” you are the same as the person who comes here and tells us the game is rigged. As such, you have the burden to prove your analysis meets some common assumptions about data, uses of data, normalcy, etc.

Saying, “well, maybe, but I don’t think so” is insufficient because it’s handwaving away alternative explanations, and that’s what you keep doing in this discussion, which you would never accept from someone who says “trust me, bro, it’s rigged.” Never.

So hold that thought and lets go back to the numbers. Two main issues are your observations aren’t independent and you can’t describe outliers.

When you do any sort of statistical analysis you typically “clean” the data, a process that involves checking assumptions and describing your sample. My largest complaint is one that you can’t answer from your data and you can’t rule out from your data is a question of influence. More on that in a second.

First, you are not considering the fact that you don’t have 366 independent observations, you have repeated measures of a finite pool of players that you are treating as independent observations for the sake of a mean (this is a no no for the type of analysis you’re doing, btw). You don’t have 366 different players versus enrage warrior, you have 20-100+ individuals playing repeated games against the deck.

We agree that skill does in fact play a role, and it would stand to reason that some of those players are better with the deck than others. In addition, some play more games with the deck than others, and when those two facts are put together, the average (winrate) is affected.

BUT, you can’t quantify this in your data set and you can’t rule out if there are outliers in your sample causing the difference you see. (curiously this works in both directions - people playing deck X exceptionally poorly in bronze can make a deck look awful when it’s actually T2 or 3 at higher ranking)

So, influence, if you have a distribution it looks like a cluster, right? Each individual in your group has an average win rate and they are all different. There’s a median point and there are dots above and below that line. At D4-1, most of the dots are going to be concentrated near that median because in sweaty tryhard land off the rank floor you don’t expect someone to have a 40% win rate and play 100 games. BUT There will be dots that are very, very far from that center of the cluster.

How far they are from that line in either direction, in standard units, dictates their influence on the results. As an example, in multivariate stats we would use a statistic to model this in a multiaxial distribution before any analysis is done, and cases that are outliers are typically dropped from the analysis prior to any regression or other multivariate analysis. (Mahalanobis Distance, for example, is a common one and wiki gives an adequate treatment of the concept if you’re curious for more depth to what I’m describing here. Should be some pictures if that helps you understand it more, too. I don’t mean this in any condescending or flippant way - pictures can help with the concept for people who are visual learners)

You can’t look at any of this in the aggregate data to conclusively tell us if there weren’t five or six or maybe even 50 players with phenomenal win rates that pulled up the average at D4-1 and then stopped playing the deck. This data isn’t even collected and it’s really important to establishing confidence in what you’re trying to do.

We don’t prove the hypothesis, we reject the alternatives and you can’t reject the alternatives because this data doesn’t measure skill.

If you could take the individual players instead of the individual games, you could actually measure skill by looking at individual player’s standard deviation from the population mean, but your analysis doesn’t actually measure anything because the data is wrong for what you are trying to prove.

4 Likes

Again, “trust me, bro” is not quantified proof of anything.

You are assuming this is the reason, not proving it.

“The game is rigged because I said it is” would be an equivalent assertion, and one I know you would roundly reject without quantified evidence.

1 Like

You are right, you can’t fully quantize the effect. It’s far easier to see the effects of skill in decks that are gaining win rate vs the ones that are losing it.

You don’t see matchups majorly improving simply because people are abandoning a deck. If Hunter has a notable advantage against a deck, you don’t expect that to evaporate because “really some of the ultra best hunters in the highest ranks are just playing something else.” You can’t prove that’s happening. It’s an effect that typically doesn’t change overall deck rankings in the same way in world of Warcraft, the worst DPS spec of a class isn’t changed just because the higher performing ones play the stronger spec. All that does is heighten the perceived differences.

So yeah, part of why a deck drops off is due to deck choices of the people with the highest skill at that deck. It is never the cause of a deck being worse. That has to happen before the abandonment. People aren’t going to just give up playing a deck with a 55+% win rate for no reason.

First off, amazing effort went into this post I don’t think I’d ever put in that much effort.

Digging into this a bit at a very high level, I don’t understand why you think one hypothetical meta is enough to isolate the effect of skill? There’s an nearly infinite number of hypothetical metas formed by all legal combinations of cards.

This definitely seems to be a very rudimentary approach wherein the scientific principle of controlled experimentation is applied to obtain a theoretical result.

As far as I can tell, you are assuming changing one variable at a time isolates it, and that is true in experimentation (for very simple systems), it does not carry over to theoretical analysis.

This approach would be far more convincing if you made many many hypothetical metas by bootstrapping from the different distributions and calculating your win rates over that. Bootstrapping would also let you estimate error margins.

1 Like

I don’t have to because I am not saying skill is the cause.

This is an assumption. You can’t show this and until you can, you can’t conclude that skill did anything.

You are discounting the reason.

They don’t want to play it anymore. Why is 1/4 of the top 1k meta rainbow mage? Because people want to play it. That’s all the reason people need to have to switch, and until you’re looking at individual players, statments about skill are wrong.

You don’t seem to be saying there is any other cause, either.

If your objective is not simply intellectual nihilism, then please feel free to add something constructive to the discussion.

… um, really?

The fundamental problem is the data doesn’t do what he thinks it does and his analysis is just made up stats because of it. It sounds mathy, but it’s a nothing burger.

I’m sorry you don’t comprehend what I added to the discussion, but that’s still a you problem.

1 Like

You really need to move past telling people that their problem is that they just don’t understand the things that you do. And for the love of God, be concise. In a couple of hundred years the entire mass of the earth will be needed to create enough storage devices to archive the internet, and at this rate you will be a major contributor to that looming problem!

I agree that the analysis is not bullet proof, perhaps you have some constructive suggestions?

So I should lie more, got it.

I feel like I have in that I keep trying to explain that this data set doesn’t do what he thinks it does.

1 Like

Typically when people disagree it’s because neither of them understand the other. Not that one is super smart and the others are John Snow.

Telling people you don’t know to take stats classes is dumb. Especially when those people are well past taking classes.

Anyway, back on topic, are you asserting that it’s impossible to draw any inferences from VS data about the impact of skill on win rates?

Yeah, that explains why there’s a lot of rainbow mage.

It’s not a huge factor on rainbow mage’s overall win rates vs other decks.

Whats constructive things are there to suggest ?

If winrate goes from 52% to 55% (corrected for matchup spread) when going from diamond to legend then the claim is that this 3% increase in winrate comes from skill.
Thats not really shocking,what else then skill could it be?

But ya,for something constructive how about this:

What this analyses is missing is the other side.
My winrate goes from 52% to 55% and this is because the skill level in legend is higher.
But at the same time my opponents winrate goes from 48% to 45% and this is not because the skill level in legend is lower.
If you look at this analyzes from the opponents side (which is an equally valid point of vieuw) then you can claim that legend has lower skill level and you can even quantify how much lower the skill level is in legend.

And you can make this conclusiong using the exact same logic and data as op did use here,simply by looking at the other side of the equation!

This alone should be enough to tell people something is wrong with this analyzes,some nuances must be missing. How else could you come to two opposite conclusions using the same data and the same logic.

But still the analyzes is not completely without merit i wont say that.
There is effects like polarization increasing with skill level that makes one side of the equation more relevant then the other side. Though i dont think op is aware of that since its not mentioned anywhere in the post.

3 Likes

Not impossible, but difficult to do with any real degree of confidence simply because it’s just messy data.

I agree with you, there is merit to the attempt.

But you can’t quantify what it did to mech rogue vs enrage warrior, and that’s pretty much the point here - this isn’t a collection of individual games, it’s the same people playing multiple times but you can’t isolate that in this analysis and it matters to the outcome.

The win rates narrow rather than expanding.

So I think the suggestion of repeated runs is sound. If there’s a genuine trend it will be present across each run and can be quantified, along with a signal to noise ratio.

1 Like

We have a binary data set of won and lost that yields a single average of win rate for a given deck match up. I don’t see how that gets boot strapped.

If we had the win rates of the individuals that played the games, we could do a very good analysis.

Imagine if this much passion and energy went into stuff like clean energy or conserving grasslands. Or educating people on civics and the duties of different branches/offices of government.

1 Like

Yes, really. You still are adding nothing, just attacking others’ methods.

Please feel free to correct me with your own hypothesis. If it “doesn’t do what he thinks it does” then what does explain the difference?

Don’t misunderstand, you could certainly be right. But if you do care as much as you seem to, and are calling everyone stupid and incompetent for not agreeing with your assessment, why not turn your laudable skills toward actually figuring out the right answer?

1 Like

I’m not sure if you’re aware, but method is critically important to confidence in the conclusion. Bad methods yield noise, not signals.

I’ve said we can’t do that with the data we have and explained why. In detail.

2 Likes

Duh. And as I said, challenging it isn’t inherently wrong.

And plenty of people still disagree with you, and instead of simply debating the merits, you went straight for the personal attacks.

If you can’t disagree without calling other people idiots for not seeing things your way, don’t post.

I’m asking you to take it down a notch. Even if you’re entirely right and the analysis isn’t perfect, the attempt should not be so viciously attacked. The way you position yourself and your arguments matters, and you’re being too personal. Everybody knows OP is working with incomplete data, yet that doesn’t mean we can’t ever learn anything.

I’d still like to hear your approach to try to assess the effect of skill on results. If not this, then what? If all you can say is “we can never know” and still try to blame the matchmaker for nefarious activity, then you’re being actively less than helpful.

1 Like