So I decided to look deeper into matchmaking

…and honestly I don’t know what to think. There are some things that do not add up in the most recent Vicious Syndicate report data.

So first off, let me explain the math here so everyone can follow along. I’ll try to explain the best I can, but I’m probably bad at it.

So the way that data is collected by VS is that you have players who have a valid deck tracker installed, and they play against the general population. The VS popularity percentages that they publish are taken purely from the opponents of players with the tracker, each of which counts as one game for the appropriate matchup in the matchup winrate chart. VS also published how many games were recorded per matchup, as well as total games per rank.

Getting the total games per D4-1 matchup is a pain, because they don’t publish a D4-1 matchup winrate table. Instead they publish a D4 and up winrate table, and a Legend winrate table. So I had to manually subtract one table from the other to get a proper D4-1 matchup table with the appropriate numbers of games.

So from here on the trick is to think geometrically. The number of games each matchup is expected to have in D4-1 is the total number of games in D4-1, 358000, times the percentage popularity of the deck among the general population, times the percentage popularity among players who have the deck tracker installed. The number we need to get to is the popularity of decks among tracker users. Geometrically, all 358k games make one huge square, with overall popularity on one axis and tracker popularity on the other axis, and each matchup is a rectangle such that all of the rectangles tile the square. So to get to tracker popularity, first we remove the portion of the square representing the 13.34% of decks by general popularity in the “other” category, which didn’t make the matchup winrate table. Then we use the total games a particular deck has among all its matchups on the matchup winrate table to determine what percentage of the area of the remaining shape it uses. This gives us the tracker popularity.

Tracker popularity, D4-1
Archetype Class Tracker pop. Overall pop.
Plague DK 5.98% 6.99%
Rainbow DK 3.69% 3.89%
Shopper DH 1.88% 2.33%
Dragon Druid 0.89% 0.89%
Hybrid Druid 4.00% 4.00%
Reno Druid 4.79% 3.86%
Token Hunter 12.51% 12.91%
Rainbow Mage 2.05% 2.00%
Spell Mage 2.58% 2.56%
Aggro Paladin 3.65% 3.58%
Handbuff Paladin 1.75% 2.12%
Reno Priest 1.97% 1.75%
Zarimi Priest 0.74% 0.83%
Cutlass Rogue 1.20% 1.09%
Excavate Rogue 4.82% 3.98%
Pirate Rogue 0.72% 0.69%
Reno Shaman 2.06% 1.91%
Pain Warlock 0.79% 0.99%
Sludge Warlock 1.18% 1.40%
Snake Warlock 3.24% 2.79%
Wheel Warlock 0.91% 1.12%
Reno Warrior 29.12% 24.98%
other any 9.48% 13.34%

So with this, the expected games we expect to see in each matchup is simply 358000 times the tracker popularity times the overall popularity. I set each matchup as its own data point and plotted them all on a scatter plot. Here’s the link:
https://i.imgur.com/nOGmyWK.png
One axis is the expected number of games on a logarithmic scale (so 2 equals 10^2 or 100 games, 4 is 10^4 or 10k expected games), the other axis is how far off it was.

So this plot is… somewhat alarming. Technically, the matchup which was the furthest % off of expectations was the Dragon Druid mirror, with 82% more games than expected… but honestly that one doesn’t bother me that much, because we’re talking 102 games when only 56 were expected. (With less than 100 games VS doesn’t publish data at all.) No, the data point that is alarming is the Reno Warrior mirror. 52080 games were expected, because it’s a very popular deck among the overall playerbase and even more popular with tracker users, and yet a whopping 60660 games were recorded.

That’s 8580 games more than expected random. Fully 16% more.

I don’t think I have enough information here to come to any proper conclusion, but I do consider this fact to be extremely weird. It’s potentially evidence that, when one class is extremely popular, there is some algorithmic manipulation to make the mirror matchup of that class more likely to occur (so other players don’t play against it so often) — but I’d want to run the same calculations on the next VS report before jumping to that conclusion. But it does feel similar to the “anti-mana screw/flood” rigging that Wizards of the Coast was caught doing in Magic the Gathering online.

Or maybe VS is fudging numbers. In any case, I’m going to keep looking into it.

2 Likes

Why not do it on the previous reports? Why wait?

There can be an easy explanation to it - since they group the similar decks based on God knows which criteria, it can be because they falsely identified that many opponent decks as mirrors.

For example, a difference between Reno Warrior and Mech excavate warrior is only a few cards, and if they don’t check for duplicates, but for similarity of decks in total, it’s a pretty common mistake to make, and why I don’t trust those data more than I trust what I experience on the ladder (and my own data)

Well if it’s not rigging then that would implicate bad data on VS’s part. So yes, trust level lowered.

Also I’ve had enough of the rabbit hole for today, I’m going to go touch some grass. Maybe later

I told you that a million times. You can’t possibly make aggregate data as precise as you think.

If you decide to only count exactly the same decks, you risk of running out of adequate sample sizes.

If you decide to count them based on some similarity factor, you risk overgeneralizing.

2 Likes

We’re definitely not talking about small sample sizes here. Since when is 60 thousand games small?

This is big. If it’s deck misidentification, it’s big time deck misidentification.

Yeah but that’s because they chose to go with the similarity index and overgeneralize.

I don’t see mech excavate warrior on the list, which means it’s under “Any” or under “Reno Warrior”

Since many other decks are missing from the list, it’s safe to assume the data aren’t as perfect as we’d like them to be.

Also, you’re dramatizing. 8k games out of 358 000 games is 2,2%

That’s not a huge margin of error. This isn’t theoretical physics, we can deal with that.

Honestly, I expected it to be bigger.

Good work, btw.

1 Like

I’m already accounting for decks that aren’t a recognized archetype.

  • 33952 games where the tracker player is not playing a deck archetype recognized by VS.
  • 43228 games where the tracker player is playing a recognized archetype, but the opponent isn’t.
  • … which leaves 280820 out of the original 358000 where both players are playing recognized archetypes. That’s the total number of games in the matchup winrate table.

Yeah, there isn’t any data on the unrecognized archetypes, but the lack of data leaves a hole, and I’ve mathed out exactly the size and borders of that hole.

OK, that means 2,85% instead of 2,2% error xD

Let’s be logical here, if people noticed their warrior games were mostly mirrors, would they keep playing warrior in such copius amounts?

It was 25%, then 29% and now I’m getting 35% of them

Somehow, more and more people play it every day, instead of less

You’re not talking just under 3% of Reno Warrior players, you’re talking 1 in 36 of every game in D4-1. A small percentage of a gargantuan number is still a big deal.

Yes. I’ve already covered in previous posts that the playerbase refuses to metagame (verb).

Anyway, with so many variables which can go wrong (data collection, data presentation, calculation), I’m not at all inclined to believe a possibility of rigged matchmaking especially when the error is less than 5%

If you want to feed the thin-foiled hats and whiny kids on this forum, go ahead, but i’m not helping you do that by agreeing xD

Luckily for the whole forum, noone likes to read threads in which you and me have the main word.

1 Like

Would matchmaking be considered rigged if it accounted for the deck that is being played? I don’t personally see the connection.

I believe there is likely a weird set of tie breaking criteria in the matchmaking algorithm. That is there’s roughly two modes I can see, there’s the mode of the algorithm where matches are scarce then it seeks a wider range of players (for example I can get matched with 1.5k player at 600 or vice versa) or the mode where matches are plenty full and it needs to decide on who to pick.

I believe in the latter case there’s probably weird logic that can be programmed into the tie breaking, or choosing the preferred opponent. I doubt they choose the opponent with the closest skill only, there’s probably some calculations based on historical statistics or data of the players that determine the best match.

It’s then very likely there is a correlation between the historic data and what the players prefer running currently.

Another option is Blizzard could very well be using class, card or deck composition information to choose tie breaking.

I’m not finding any in my tracker data, even when I’m on bad streaks and it seems like there’s a connection

Nah. No need to do that.

Besides, if they knew or wanted to program composition-based matchmaking, their balance patches would be much more on point. The game is too complex for that.

You can counter one deck, but if that deck changes 2 cards, suddenly it counters you.

1 Like

As I said previously, there are many, many, many things which can go wrong with data Scrotie analyzed (and usually does) which are much more likely to explain the discrepancies, rather than game being rigged.

And why would it be? What’s the agenda? When a class is broken, it’s obvious from miles away, so trying to hide it with rigged matchmaking isn’t going to work.

Anyway, I’ve always clearly told you that the aggregate data CANNOT and MUST NOT be relied on blindly. You always have to take multiple variables into consideration which can (and do) impact the interpretation.

Just say the word and I’ll count and explain at least 20.

There’s an incentive to give streamers and public figures that put Hearthstone on display to the public favorable RNG or matchmaking.

Not saying it’s happening, but how certain are we it’s not?

1 Like

I’m very certain it’s not the case, because:

a) I watch them while I play all the time, and
b) I play against them all the time.

They go through similar streaks like you and I do, falling from top 100 to top 2k and then to top 50, unless they get a very lucky break at the end of the month to finish top 50 and then their mmr gets frozen there when the new season starts.

Also, that idea might be the worst one mentioned in this thread. While other things are technically possible, as long as they play the same patch and same queue as we do, it’s impossible to rig the matchmaking in favor of a few individuals.

Is it so hard to believe they’re just better on average because:

a) it’s their job which they do more than 8 hours a day, and
b) they have chat and other streamers backseating and helping them?

Btw, b) is something I’m strongly against. Many times, when you queue into them, you’re not even aware you’re actually playing against 2-3 pros and 500 chat legends, unless you’re sniping.

But those things happen often.

P.S. I’ve been lowrolling for 2 days non-stop. It’s disgusting. I’m the one who should be saying those things, but I’m not xD

Anyway, judging by the fact that I’ve highrolled for straight 2 hours on NA 3 days ago and that a week before, and 2 days after I’ve had such unfair things happen to me, I think we’ll be seeing me in top 50 in a day or two. I’ve figured out the advantages of playing on multiple servers. You can basically rig your own matchmaking like this - when you lowroll, go to the server you don’t care about or the one with worst current MMR. When you start highrolling, switch to the one with the best MMR. That way you maximize your chances to get a high ranking on that one server.

When I start to highroll it’s going to be disgusting. Even while lowrolling, I have 15-9 on pally, and that’s with 2 sea giants and a Southsea Deckhand in starting hand.

1 Like

My guess is more that how you calculated the matchup data just differs a bit from the raw data that VS used for the popularity graph in D4-1.

It could be something stupid like legend’s matchup data excluding top 1k, but D4 and up including it.

Margins are very small here.

The data will be noisy, it is not a controlled sample and shifts while it’s being collected. It also reacts to it’s own composition over long enough time-frames.

That said, a rigging hypothesis based on deck type will be reflected in significant matchup variance between deck types if true. People aren’t going to notice 5 games out of 100. They’re saying that most matchups depend on your deck type. That’s something you can’t miss.

3 Likes

The one thing i can tell from these statistics and the other thread with the popularity statistics. There is some free formed delusion that is going around in Hearthstone that makes players think Plague DK is a good playable deck. I’m not saying the data is wrong because i still hit those streaks of free wins in D5-legend of Plague DK’s.

I’ll give them props for being stubborn though.

1 Like

I mean, as I said there’s a billion variables impacting and tainting the data.

One of them, related to above posts, is that there are classes which are preferred by most of the playerbase, which every single player plays for 2-3 games a day and artificially inflates their popularity stats.

It’s not like 20% of people play a 33% winrate deck all the time, noone’s that stubborn. It’s that 90% of people play that deck 1-2 games per day, to enjoy the playstyle they prefer and to avoid burning out from the boring and life-sucking grind.

I’ve talked to people in streamer chats, many of them confirm this hypothesis. I myself start my grind by losing 2-3 games with an unviable deck simply because I like playing it and it will never change.

When I lose a lot of games, I do the same, to prevent tilting and burning out. When I’m losing already no matter what I play or do, couple of fun games can only do me good.

That’s just one of the variables tainting the data, completely unrelated to this thread, but it’s an answer we’ve been looking for regarding some classes like Mage and DK

The scientific approach is to see what you can rule out with the available data. All scenarios that cannot be ruled out are possible, but not necessarily probable.

Agree there are likely to be many possible causes of a 2-3% variance, most of them very boring.

The fermi paradox page on wikipedia is actually a good example of how you can explore many scenarios that fit available data, rather than saying “I believe xyz is happening because it fits the data”