Polarization and the effect of skill, DR#279

I think you are just lumping luck in with more factors than I generally do.

My personal luck definition is games that are decided in ways that personal decision making /deck choice can’t overcome.

For example, I consider it a “luck” win when a paladin gets to showdown, double sea giant, and prismatic you on turn 4 or 5. It doesn’t really matter what you did before or do after, almost regardless of deck choice, luck decided that game.

Being paired up against a paladin as a druid I’d put into the deck popularity category more than the luck one. The average matchup is really bad for you there. The paladin doesn’t need good luck to beat you, they need particularly bad luck to lose.

In any individual game luck can have an outsized impact, but those effects aren’t nearly as overriding as they are across the entire meta. I.e. you would never really be saying “oh, at top legend, people are luckier, thus it’s a tier higher.” The luck effects are already built into the average win rates from top to bottom.

The flip side of Yoda’s “do or do not, there is no try” is that there is no can’t. There’s just games where skill overcomes and games where it doesn’t. Do or do not. This is one of those situations. I am dealing with percentages, not binaries.

The average polarization of D4-1 overall is 13.99%. That means that factoring out deck selection, the matchup usually starts with one player having a 57-43 advantage. That’s what skill, either in the positive form of good play by the underdog, and/or the negative of accidentally throwing by their opponent, has to overcome.

I’m not saying that it can’t be overcome, obviously it does because we see in the data that it sometimes does. But it’s a significant challenge.

I mean, there is definitely can’t. There are some plays that decks literally have no answer to, and sometimes they happen early enough that there is no way to prevent them.

You only see that in individual games though, not aggregate percentages, and we have no way to quantify what part of the win rates that accounts for.

I anecdotally find that those situations happen more in highly polarized matchups, but I couldn’t even begin to put a number to it.

Sure, but those reasons have no relation to my point, which is that removals not lining up is not a test of skill.

What you do if you got bad draws is a test of skill, sure, but just the removals not lining up in and of itself is not a test of skill. I pointed out that even aggro can have better or worse draws. In the case of bad draws, aggro will also have to react accordingly.

Bad draws is something that just happens.

Aggro is less likely to get bad draws because they built their deck to have good curve? That proves my point further. We know there are ways to minimize the impact of draws not lining up. Control is choosing to not take those options. So they end up having to be “tested” isn’t a demonstration of their skill, but rather them suffering from the consequences of their own choice.

I could for example, as I often do, run a deck that has many RNG effects, so I end up having to make many more choices on the fly. Is that a test of my skill? Not really. It’s more of a demonstration I’m at best more interested in fun and memes than winning. At worst it just means I’m a masochist :stuck_out_tongue:

If I pull off a win with my RNG clown feista deck, my opponent is more likely to scream “wtf is this RNG bs” than “wow he was really skilled, and handsome”. Though they’d be correct about the handsome part…

The way I see it, sometimes it is and sometimes it isn’t. There ARE situations where, for example, you should hold off on a board clear for one additional turn in order to kill more minions with it, especially if you can use that mana towards something else like card draw. In that case your skill is lining up your removal on an extra minion.

However, in order to even do that you need to draw the card first, and the best way to answer all the threats is to draw multiple correct answers and not even need to hold back. So there’s a huge luck element as well.

As I already said I think luck is the bigger factor here. Skill is a factor, and most importantly a factor under some degree of individual control, but it’s not the biggest factor.

One thing I forgot to address here is deck popularity. Now of course you could imagine a meta that’s 99% scissors to your rock and blame the fact that you don’t have a 99% winrate on deck popularity. You can’t really untangle the meta as a whole from performance.

One thing my analysis covers is the effect of the difference between the D4-1 meta and the top Legend meta on winrate. Unlike trying to untangle the entire meta, this is just looking at one measurable meta shift.

The effect of that particular meta difference is approximately equal to the effect on winrate of the skill difference between those ranks, in this report anyway. The highest effect of the meta shift is +2.78% and the lowest is -2.39%. Top Legend might be different from D4-1 in terms of what opponents you’re facing, but the effect of this on winrate really isn’t that huge.

It’s also worth noting that the meta effects on winrate at top Legend, although small, are usually rational. By which I mean: the deck that gets the largest winrate bump from the meta shift is Undead Priest, which goes from Tier 4 to Tier 4 as a result. With 0.11% popularity at T1KL, it’s just not worth metagaming against. The one that got the biggest decrease in winrate due to the deck popularity shift was Pure Paladin, because of course they’d metagame against the most popular deck.

So it really should just be “most of this game is straight luck of the draw.” Which is true. Deck popularity isn’t really a big factor. And deckbuilding skill REALLY isn’t a factor, because netdecking.

Do you have a deck list? I’m assuming it’s highlander? I don’t like the rainbow non highlander mage. I need a good list with rommath.

I strongly disagree. It’s a fact that as soon as Countess enters the Paladin’s hand, they’re over 2% LESS likely to win the game on average. This is by HSR, Diamond 4-1.

Regarding the Pure Paladin vs Rainbow Mage specifically, I don’t know. But it seems to me that the rationale for running Countess is flawed, even if it is good against Rainbow Mage. Maybe you make a bad matchup even, but you give up more in your other matches for a net loss. The equivalent of running a sideboard card maindeck. (That said, it’d suck being a blue deck in Magic and running across a maindeck Tsunami. Opponent rewarded for their badness.)

This is so confusing, how can such a strong card reflect a drop in win rate when drawn? The opportunity cost of drawing it instead of the low curve spells you should be drawing like boogie?

I played pure Paladin when it wasn’t a brain dead deck, back when weapon builds gave it burst turns, and I recall countess draw early sucking, because you really want to be drawing on curve, plus you can tutor your big cards at any time with order in the court.

I wonder if the stats you are looking at are confusing this. Sure countess sucks when drawn early, especially the Mulligan, in which case I understand the drop in win rate due to how important a strong early game is to Paladin. But you can’t discount her ability to just outright win games, by players finding exactly what they need.

For example, I had a game against a Paladin wherein I had them beat as long as they didn’t heal more than 10, sure enough Paladin finds copies of 2x priest colossus and heals for 16.

To a certain extent yes. A maximum skill exists somewhere.

The problem is that maximum is not as tangible as most people believe so good luck for anyone reaching that.

Matchup specific mulligans , extreme math calculations on certain cards(prison of yogg saron), dealing with non ideal RNG results and taking they into account when deciding to use RNG effects rather than pray.

It’s not that you’re wrong about it’s existence somewhere.
Just that even a game as simple as hearthstone that limit is far more theoretical than something tangible.

We are not talking about “Getting some results”.
We are talking talking about the very limit of what a human being can do.

And the very limit in fact includes insane stuff like that.

1 Like

I don’t think you’ll like my take on rainbow mage in this case. It’s not a highlander, and doesn’t run rommath. I don’t think as rainbow you want to out value your opponents, you want to out tempo them typically, which is possible due to all the early minions.

I’ll include my take on a list here. I think the weakest card in the deck is the kobold and could be replaced for some kind of tech.

Also the 2x solid alibi is totally unique to my list I haven’t seen it run anywhere, but I swear by it for most matchups. Sometimes they are dead draws, but you can use them proactively to soak up a big board turn which you plan to clear with creation.

rnbw

rnbw

Class: Mage

Format: Standard

Year of the Wolf

2x (1) Arcane Wyrm

2x (1) Discovery of Magic

2x (1) Flame Geyser

2x (1) Miracle Salesman

1x (1) Tram Mechanic

2x (2) Cryopreservation

1x (2) Infinitize the Maxitude

1x (2) Kobold Miner

2x (2) Prismatic Elemental

2x (2) Void Scripture

2x (3) Solid Alibi

2x (4) Cold Case

2x (4) Inquisitive Creation

2x (4) Reliquary Researcher

2x (5) Wisdom of Norgannon

2x (6) Blastmage Miner

1x (6) Sif

AAECAaXDAwTr9AXR+AXKgwaYlwYNyt4E294EqpgFq5gF7PYFv/4F2P4FlYcGhY4Gg5UG85sGs5wGsp4GAAA=

To use this deck, copy it to your clipboard and create a new deck in Hearthstone

1 Like

Everyone keeps talking about Rommath Mage and I can’t for the life of me figure out why people think it’s good. No one has a list that’s “good”.

1 Like

People just wanna play new cards and give it as excuse.

It depends on what you mean by “exist.” There might be some theoretical maximum skill, but even Stockfish can’t play chess perfectly yet. Just much better than humans, on average. Does the theoretical exist? In one sense yes, in another no.

But I think we’re mostly on the same page. :grin:

That’s because there isn’t one. If you’re going to experiment I’d try building some kind of Rainbow-Excavate hybrid.

Very close to what I’m running at the moment. I’m on Blast Wave and the second Kobold over Alibi but may test that change. It’s a lot of Frost spells, though and I like the 3rd fire spell.

20 - 5

Summary

2x (1) Arcane Artificer
2x (1) Arcane Wyrm
2x (1) Discovery of Magic
2x (1) Flame Geyser
2x (2) Cosmic Keyboard
2x (2) Cryopreservation
1x (2) Infinitize the Maxitude
2x (2) Kobold Miner
1x (2) Vast Wisdom
2x (2) Void Scripture
1x (3) Prince Renathal
2x (3) Reverberations
1x (3) Rustrot Viper
2x (4) Cold Case
2x (4) Inquisitive Creation
2x (4) Reliquary Researcher
2x (4) Volume Up
2x (5) Burrow Buster
1x (5) Star Power
2x (5) Wisdom of Norgannon
2x (6) Blastmage Miner
1x (6) Sif
1x (9) Grand Magister Rommath
1x (9) Yogg-Saron, Unleashed

AAECAf0ECJfvBKOQBZCWBfPyBev0BdH4BamVBs2eBhDb3gSrmAWAwgXs9gXQ+AXe+AW//gXY/gXxgAbKgwbQgwaVhwaDlQbzmwaznAayngYAAA==

1 Like

Originally I ran this build, but found blast to be nearly useless unless you draw it same turn, and the kobold is by far the weakest card in the deck. you could argue the only reason hes in the deck is to active the minion with the secrets, but that minion is not a strict necessity to play to win.

I think it’s safe to remove all three of those cards and tech something else in. 2x solid alibi gives really nice breathing room to find Sif in an aggro/tempo meta, espcially good at setting up turns for getting maximum value off your creations.

If you are running into a lot of annoying control decks like blood dk consider swaping the two solid alibis for 1x reverb and 1x vast knowdlege.

The more likely answer is you are not able to compare these two groups in the way that you want to compare them and get interpretable results.

When two different groups are compared, one typically uses a standard unit like a zscore, which is in standard deviation units. This allows you to look at real differences from average weighted by the average of each separate group.

In the most basic terms, a deck that has a 60% win at diamond may be two standard deviations above the mean both at diamond and at top legend, but only win 55% at legend. It looks like the match up is worse, but in standard units, it is the same.

But directly plugging in the numbers as you have just creates a sort of noisy something that you cannot really define well enough, which would make inconsistent results for your hypotheses more likely than not.

Interesting effort, though.

As for the polarization, why are you not using the distance from 50% (an even match that has zero polarization) instead of what you’ve done? If I follow what you have done, you are recording the polarization twice by using the full amount in both decks when you calculate them separately. You would get cleaner results for the most part.

2 Likes

There are two sets of raw data here: matchup winrate and deck popularity.

You might have noticed that overall winrate is not included in that list. This is not something that I decided. If you are imagining the raw data as 27 archetype names each with a number of games won and a number of total games, that’s just not how Vicious Syndicate, the source of the winrate numbers, operates at all. They call that “actual winrate” and they reject it. Instead, the numbers you read in the Data Reaper reports are calculated as (to use Excel speak) the SUMPRODUCT of the matchup winrates and the deck popularity. They call this “estimated winrate” and consider it to be more accurate than actual winrate. You can read more about this here:
https://www.reddit.com/r/hearthstone/comments/f3661g/why_actual_winrate_data_is_flawed_and_why/
So no meaningful standard deviation is possible on the estimated winrate results that they publish.

Regarding matchup winrate data, I consider every individual game to be of equal weighting. I’m not going to take a standard deviation of 702 matchups (27 archetypes, squared, minus 27 mirror matches) when the Pure Paladin vs Naga DH matchup is orders of magnitude more common than Mech Rogue vs Elemental Shaman (both below 1% popularity in D4-1). So as far as I’m concerned, the raw matchup data is 2,193,558 instances of archetype, opponent archetype, and either a 0 or a 1. Because the number of zeros tautologically equals the number of ones, the standard deviation is 0.5. Yawn.

What do you mean, cleaner? In a 55-45 matchup, the way I’m measuring it is 10%, the way you’re measuring it is 5%. It’s just doubling the number. If you don’t like that, just halve it in your head.

Now why am I doing it that way? First off, it makes the maximum polarization (a 100-0 matchup) 100% polarization instead of 50%. Second, the last time Vicious Syndicate released an article on polarization, they did the same thing.

You are not understanding any of what I said, but that is fine. This is a silly forum for fun, so your attempt at data analysis fits just fine.

Each sample, top 1k and diamond, has a differnt amount of variance in the group. This is very much relevant to your calculations but ignored by your analysis.

In a population mean, yes, but in your specific samples it does not.

So you have no reason other than to copy someone who knows more than you. Okay.

I will just let you know that I applaud your attempts despite the fact that they mean very little. Clearly, from your reply, you are not adept in these topics and I will let you get your clicks and attention without further critique because that seems to be the major reason you post here.

I don’t think it’s fine. I either want to know that I don’t know what I was talking about, or that you don’t know what you were talking about, and right now I really can’t tell.

Sample of WHAT? Group of WHAT?

Relevant HOW?

Every game has a winner and a loser. The data set is a collection of games. Could you explain how the mean could be anything but 50%?

Edit: I Googled “population mean.” We are talking about matchup winrates here. So if the deck tracker records a game where Pure Paladin wins against a deck whose archetype couldn’t be identified (for example, they conceded on turn 1) then the data is thrown out. The only way for [Pure Paladin vs Deck X] to get a win is for [Deck X vs Pure Paladin] to get a loss, and vice versa, on the basis of the same game. The SAMPLE mean is tautologically 0.5.

It’s not because “they know more than me.” It’s because they (obviously) have a bigger audience than I do, and if they’ve established a convention then it might be confusing to readers if I use a different one. Plus, converting between the two is a super simple 2:1 ratio anyway. Why does this even matter? It’s stuff like this that makes me wonder if you’re not arguing in good faith here.

I don’t have a college degree in statistics (or anything else) but I’m trying to be adept here. Obviously I’m passionate about that.

Nah, more of a secondary reason. The primary thing I’m trying to do here is come up with working theory on games, with some degree of predictive power. It’s something I’m just sort of intellectually interested in and want to know for its own sake.

I welcome your constructive criticism. By which I mean something I can learn from, instead of just putting me down and telling me I don’t know. If you don’t want to write it out, maybe a link to something useful (like the Reddit link I provided, which you strangely didn’t comment on). But right now all you’ve done is have me wondering where to even start and doubting myself, and until I get some kind of resolution it just kinda feels like being gaslighted.