Evidence for the Failure of Competitive Matchmaking in OW2

Thanks to everyone who replied! If you haven’t already, please make sure to take a look at the chart that illustrates the main finding:

https://i.postimg.cc/tgFkDKWT/overwatch2-season3-matchmaker-moving-average.png

(It’s unfortunate that the forums won’t allow embedding an image; having the main plot right in the middle of the text would have looked much better. As it is, the URL gets lost in the rest of the text.)

To the individual points:

This is one possible explanation yes, and I would not dismiss it out of hand, but it leaves two important questions unanswered:

  1. Why does the distribution reverse modes with such consistent regularity, i.e. every 25 games?
  2. Why does the effect appear even though every cycle spans multiple play sessions?

These are related, but let’s consider (1) first.

It is plausible to claim that players might get into a certain behavioral or psychological “groove”, and that the concomitant intensity burns out and leads to a crash. Certainly, there are many relevant variables that might drive player behavior in a way that could affect performance, and thereby outcome. The problem is that these effects ought to be local with respect to clock time, not with respect to match count. That is to say: If I burn out after an intense winning or losing streak, one would expect to be able to predict this outcome after a certain number of minutes or hours, regardless of the number of matches played. Attention definitely fatigues after some time, but attention is going to vary with time spent not matches played. However, the seasonality observed in the time series above is tied to match count, not to clock time. It would be extremely strange, to say the least, if I became fatigued or inspired at 25 matches consistently, when the total clock time to complete 25 matches varies as much as it does.

Point (2) is even more serious.

Here’s a plot from the dataset, showing how many matches I played per single calendar day:

https://postimg.cc/dk6nk5tP

One thing should immediately jump out: I never played enough matches in a single day to traverse a full 25-game cycle. Most days, I played less than six matches. If the 25-match effect is based upon some kind of player fatigue or intensifying emotional affect, how does that fatigue or affect carry over to a session that usually takes place about 24-hours later? What about the fact that I’ll be eating, sleeping, working, and doing things other than Overwatch in between? Never minding the fact that it would usually take multiple such sessions to complete a full cycle? To be honest, it seems far-fetched to propose that I stay so tired and angry between sessions that my losing streak can pick up right where it left off – and then end right on schedule.

Well you definitely shouldn’t believe everything you hear on the Internet, so I applaud your skepticism. However, I would invite you to return to the original post and read it more closely. The only point that I qualified with “as a professional” was a passing remark about how Blizzard could say “there is no ‘loser’ queue” and how that could be true in a very narrow technical sense. That point is not central to the analysis at all.

The point of the original post is that my complete match data for Season 3 showed cycles of winning and losing that repeated on a stunningly regular cadence, and that such a phenomenon simply should not arise if the matchmaker is working as advertised. The argument proceeds from logical premises and empirical data, and the original data is included in the original post if you’d like to run the numbers yourself. You could believe that I scrub toilets for a living and the facts would still remain.

This would be a sample of N=1 if my hypothesis were “are all Overwatch players experiencing this same periodic matchmaking behavior.” Now, I admit that is an interesting question, and I would love to know the answer. I agree there is reason to be skeptical that everyone is seeing this behavior. I’m playing on a rather old account, and a lot of anecdotal evidence suggests that the matchmaker is struggling in particular to calibrate MMR under this condition. Not everyone is playing an old account, and not everyone is playing from the same point on the MMR distribution, and both of these conditions probably matter quite a bit. While I think there probably are other players experiencing a similar phenomenon, I also think there are others who are not.

To the point: The hypothesis under consideration in the original post is, “Given this observed time series, how probable is it that a correctly functioning matchmaker would generate such markedly and regularly bimodal match outcomes?” If you’re familiar with statistical inference, then you’ll recognize this is perfectly analogous to measuring log-likelihood of a sample with respect to a hypothesized distribution. Of course, analyzing time series requires different analytical tools than those applied to unordered samples, because points in a time series are generally not independent or identically distributed (iid) – but that doesn’t mean analysis is impossible. In fact, folks do it all the time.

Since you seem interested in the particulars, let me break down the analysis even further. Because the original post was rather long, I worried folks’ eyes would glaze over if I also went into the details of hypothesis testing, but I’m happy to elaborate. So here we go:

I wanted to be sure that I wasn’t spuriously imposing imagined patterns to the series, so I followed the standard approach of de-trending the series and searching for possible seasonality. The original chart is striking – it’s what inspired me to write all this up in the first place, because I was shocked the first time I saw it – but I wanted to make sure that the visual intuition was reproducible from impartial quantitative analysis. I de-trended the series by taking a first-order difference, then tried fitting a sine function to the result, using scipy.minimize, parameterized over the frequency. Sure enough, 25 plus or minus a tiny epsilon popped out as the best fitting parameter. Assuming that the seasonality here is additive and not multiplicative, I subtracted the fitted sine function from the de-trended series, and applied the Augmented Dickey-Fuller Test (ADF) to the result, to test the hypothesis that only white noise remains after the trend and seasonality are removed from the original series – i.e. that seasonality explains most of the observed variation after trend is removed. The resulting ADF statistic comes in at about -2.6, with a P-value of 0.08, which is easily strong enough to support the hypothesis at hand, i.e. that the seasonality accounts for the most of the non-stationarity observed in the original series.

To the earlier point, this,

is untrue.

The original hypothesis technically hinges on the question of whether the time series is mostly white noise after the seasonality has been removed, and this in turn can be determined by testing for stationarity. The ADF statistic does this, and the numbers show that there’s only an eight percent chance we would draw a sample this stationary from a non-stationary series. A back-of-the-envelope calculation shows that a sample size of N=202 ought to be more than enough for 90% confidence with a 5% margin of error.

The data is up there if you’d like to counter with your own analysis.

Thanks for sharing your data, as it is relevant, and I would love to see more people sharing their data, as it gives us a better insight into what might be happening.

However, this data doesn’t “contradict” my data, because that’s not how data works. It might contradict a hypothesis or a claim, but by definition data is what it is. It would be interesting to compare your results, but your dataset is missing a key feature necessary to reveal the presence or absence of streaks: You have not included wins and losses in the order that they happened. As such, we can’t run the same moving average on your data as was run above, which means we can’t unambiguously compare the two.

I would also point out that your dataset is considerably smaller than mine. The Support data contains only 52 total matches, which would be only just large enough to maybe see the cycle turn once, if it is in fact present.

At any rate, you certainly can’t combine the roles into a single dataset, because the matchmaker uses different MMRs for different roles – combining them would mix unrelated distributions, and give spurious results.

Nonetheless, if you do have the original outcome data in the order the matches were played, I would love to see it.

I addressed this point above, but it’s worth asking again: Do you really mean to suggest that my ability abruptly reverses quality, on schedule, every 25 games?

Thank you, and you’re right!

If you do find a way to break the streaks, that itself would be very interesting. It would be a little tricky to tease apart from other factors, so it’s probably worth forming a hypothesis in advance. Good question though!

Likewise, if you do keep this data in the upcoming season, I would love to see it. It’s likely something will change with the matchmaker in Season 4 – but it’s anyone’s guess what, and we won’t know if we don’t keep the record.

You’re welcome, and thanks for reading.

Also thanks for reading!

Strongly agree. I can even see, in principle, some reasons that one might use some kind of hidden metric for matchmaking – but it would be for reasons like smoothing out wild oscillations in rank e.g. for a new account. Even then, I don’t think it’s strictly necessary, and it’s clear that the existing system performs very poorly. Regardless of whether or not it’s doing what it’s “supposed” to do, it’s clear that everyone is having a bad time and hates it, and that should be enough reason to ditch it.

Phenomena like these are really noteworthy, and go back to another related problem with the current system: Groups are hidden. Blizzard argues that they don’t want people to form preconceptions of the match, and thereby “give up early”. Even if that is a worthwhile goal, the fact remains that it hides a factor that would be highly relevant in explaining how the matchmaker is giving out such strange results, and in interpreting some of the ways that a match may have gone the way it did, i.e. if one team is being hard carried by someone way outside the MMR distribution of the rest of the match, that is highly relevant. Even just seeing that information at the end of the match would be worthwhile.

Could you elaborate on this a little bit? If you’re talking about looking at the profiles of other players, I admit that I’ve had poor luck with this approach, as most of the profiles I try are private.

5 Likes

One thought. The problem may be that the ranked system focuses too much on wins and not enough on what happens in the match. For example it treats an 0-3 blowout exactly the same as a tie match where you barely loose in overtime. I think a lot of players would be a lot less sour about getting losses if they knew they were still getting some credit for what they did during the match. Looking at what happens in game would also help the ranked system to converge on more accurate ranks faster because it is making use of more information and it would somewhat decrease the impact that being placed on a “loosing team” has on your rank.

3 Likes

Great post, add this to the list of things to provide next time someone says there’s no proof or “where your evidence?”

Not that we need this post to prove the matchmaker is bad since that proof is there through countless thousands of eye witness accounts, but this is a really great post to have.

2 Likes

They told its based on winrate meaning you gain points when you win, you lose points when you lose, no matter the stats within a game. Iirc they mentionned the gain/loss quantity of points depends on the MMR of your opponents. I guess they can’t give you exact calculation as the points you’d earn/lose after each match could be different.

Iirc devs told the matchmaker uses a “predictability” value. And it makes sense actually. If you win, your MMR would increase and there’s a question coming to matchmaker mind : are you under your “real” MMR or not ? So probably the more you win, the stronger opponents you’ll face with an multiplicative increase.
That helps to put people close to their “true” MMR quick enough.

Devs told on Eskay stream that there’s also a “uncertaintity” value attached to MMR. For what I get, it grows when people don’t play for a long time (so probably bound to MMR decay).

The problem is that we dont know when “predictability” and “uncertaintity” kicks in and how hard do they kick.

As MMR is hidden, I’m not sure how people can say that “MMR reflects performance”.
It merges with people telling their ranks in a game : its pointless as Skill Rating and MMR are 2 different numbers and MMR is hidden. Rank displayed isn’t and should stop being taken as a reflection of MMR / matchmaking.
Devs tell Season 4 will fix that by making SR and MMR getting much closer to each other so we can be surprised by some SR jumps/drops in S4.

I’m sorry, I might not understand and I might be wrong but I imported your data into a text editor : you have won 101 games and lost 101 games. It makes a perfect 50% winrate from your 202 games.
If we look at the winrate graph, well its easy to see you have been over 0.5 winrate for a longer time than under 0.5 winrate.
So I don’t understand how you can say that matchmaker systematically forces losing games as it would say you’d get under 0.5 winrate. Or you might want to tell the “forced 50/50 winrate” would be a thing ?

There are also things to considerate as we treat that kind of data :

  1. The matchmaking system has been updated during season 3.
  2. The matchmaking system takes also queue time in consideration.
  3. We have no idea about the variation of your MMR after each game. Especially since you might have been playing when there were too few players around your MMR to get a balanced game.

Imo your winrate graph shows there was a problem during 50th game and 90th. The hill after 125th might be weird but maybe you did improve (or their update had an impact) ? Maybe you started playing a different hero ? But it shouldn’t have felt that bad.

Taht’s also a problem regarding the matchmaking design : should it be fair or should it “feel good” ?
I also feel like just by looking at outcomes without stats, its hard to know if the matchmaking is failing or not : there are tons of reasons to lose a match.
Even if we had stats, I feel it would still be hard to precisely know when matchmaking farts or not. Sometimes you can tell as you have ONE player getting much worse stats or much better stats than the rest of all players. Or when one team wipes like each minute.
But still, without having a look at MMR of each players, it’s still assumptions and no facts.
Devs told there were MMR problems and bugs… But it doesn’t mean that we can tell we spot those problems just by looking at our history.

2 Likes

Amazing effort. Thank you for this analysis

3 Likes

Update: I’ve included a short summary of this finding in a report on the Overwatch 2 Bug Reports forum. If you feel you’ve been impacted by this issue, you might consider signing on to that thread.

Now responding to the latest points:

This is very kind of you to say. If folks have found this post useful, then I am glad to have made it.

As I said, they do not disclose how exactly MMR is calculated. This omission is significant, though, as it muddies the question of what MMR reflects, or whether it reflects anything at all. However, we can be reasonably certain that it is the most important variable in assigning matches (as its name suggests), and so we can make indirect inferences about it by observing a large number of matches.

Which immediately contradicts the claim that the matchmaker uses only wins and losses in calculating who will match with who.

Do we have any particular evidence that the increase is multiplicative as opposed to e.g. additive?

This yet another variable in the calculation of MMR, other than wins and losses. We now have that MMR is based on:

  • Recent match outcomes (win/loss)
  • Intensity of engagement as a function of time
  • An unknown confidence term (“predictability value”)

To the point of MMR decay:

Here’s a quick plot of the same data, charting the number of matches played per calendar day:

https://postimg.cc/McBKY1x3

It’s worth pointing out that the data above is spread more or less evenly over the observation interval. In the 49-day observation period, there were only five day-resolution intervals wherein I played no matches at all, and none of these lasted more than three days. If my MMR did decay, then it did so very abruptly.

I assure you that people absolutely do say this, and if you spend any amount of time on this forum, you will definitely encounter them.

If I understand what you’re saying here, it would appear that public statements by the development team directly contradict you. In a January 30 post to the Developer Blog, Blizzard’s spokesperson says:

A player’s visible rank will move towards their rating over time as they continue to play during a season.

If people are seeing numerous out-of-place ranks in their matches, it suggests either that MMR has failed to move close to the visible rank, or that it has and that the matchmaker is simply making poor matches.

Thank you for taking the time to look more closely at the data. I have taken this fact into account; I noted it in the original post:

Please remember, though, that this is a global mean. As such, there are many different sequences of outcomes that have the same mean. For example, both of these sequences have a mean win rate of exactly 0.5:

WLWLWLWLWL
WWWWWLLLLL

However, they represent very different states of affairs. Yes, the data above has a global mean of 0.5, but the data also has a extremely high variance through time. It reaches the 0.5 value through a wild oscillation between very high highs and very low lows, and it just so happens these balance out by the time we reach the end of the table. This might be incidental, and it might not, but I don’t think there’s enough information in the analysis above to say with certainty.

I have considered the hypothesis that the matchmaker is pushing outcomes toward a global mean of 0.5 in all cases, but I don’t think I have enough information, at least by myself, to say how probable this might be. There’s reason to suspect it, yes, but it’s not self-evidently true either.

I’ve stated that the “matchmaker systematically forces players into losing games” because it demonstrably switches between modes of favorable and unfavorable matchmaking, and appears to do so on a remarkably regular cadence. It strains credulity to suggest that any of the other factors that could be determining match outcome alternate mode every 25 games, and continue to do so over the course of 200 matches.

To these points:

it’s remarkable that all of these factors might vary match outcome and the signal still shows a strong 25-match periodicity. That means that the influence of matching is so strong that even these numerous possible sources of noise are insufficient to drown out the signal.

I would also point out that I marked the dates of the two major patches (February 21 and March 07) on the original plot of the win rate. These are the dark-grey dashed vertical lines. They do not appear to coincide with any particular change.

My opinion? This is a game, and if it feels bad, then it’s a bad game. People will stop playing.

In this case, it’s not hard. Go back to the arguments in the original post: if the matchmaker were matching evenly, then MMR would converge, and match outcomes would resemble white noise, possibly colored by a week trend upward or downward as long-term player behavior changes. What has demonstrably happened in this case is that the matchmaker produced regularly repeated streaks of winning and losing, and did so over the course of two months of steady play. This was not a one-off or a black swan – this was a marked behavior of the system, which it exhibited consistently over a long period of time.

So, I apologize in advance if it seems like I’m being cross with you here. I sense you’re making a good-faith argument, and although I disagree, I do still respect and appreciate your engagement. I’ll try to avoid a harsh tone, but that can be difficult in text, and I admit that I’ve grown weary of certain arguments, enough so that they’ve started to irritate me. That’s not your fault, and so please understand my next remarks are directed more at a genre of argument than at you personally.

To the point:

Please step back and consider the claim at hand. We are literally looking at a spreadsheet of 200 dated match outcomes subjected to statistical analysis. If that is not factual enough, then what is? What is it going to take? 100 more matches? 1000 more matches? A dozen more players showing up with spreadsheets just like this one? I am literally sitting down and transmitting facts directly to you, the reader – if this is not factual enough, then what will be?

I have easily read dozens of posts in defense of the matchmaker, asserting that any claim against it is “all assumptions” and “no facts”. None of these posts seem to offer any meaningful explanation of their own, beyond “works fine for me” and “everyone else is whining or imagining things.” Their only “facts” are their own gut feels that things seem fine. Of course, they have endless ways to explain away every claim to the contrary – but they never seem to indicate that there is any evidence that could possibly sway their unshakable faith in the goodness and rationality of the Overwatch 2 matchmaker.

I’m sure there is some variation in player experiences of the current matching regime. It probably is working for some people out there. But the temperature of the forums shows that there is a predominantly negative experience of matching right now, and it strikes me as nothing short of magical thinking to dismiss all those reports as nothing but delusion and hysteria, especially when they are confirmed by empirical observation.

The facts at hand are abundant. The matchmaker produces streaks, and it does so with eerie regularity. According to the statements of the Overwatch developers, it is not supposed to behave that way. Quoting again from the same developer blog referenced above:

Sometimes, if a player goes on a very long win/loss streak, it’s indicative that their internal rating is not well-calibrated. The best way to calibrate your rank is to continue playing competitively. The more data we have, the closer you’ll get to a rank that best represents your skill.

I want to emphasize, again, that I have been playing Overwatch 2 on this very same account since the start of Season 1. In fact, I have been playing Overwatch since Season 3 of Overwatch 1. The matchmaker has abundant data on me. Exactly how long should it take to become “well-calibrated”?

The same post continues:

However, there are times when players are going to get lucky with their win streaks or the opposite with loss streaks.

echoing common talking point of MMR-apologists that any streaks are just the sort of outliers that inevitably appear now and then in any large sample of a random variable. But if these streaks are outliers, then how is it that they repeat with such uncanny regularity? Do you really mean to suggest that my “luck” abruptly changes every 25 games, and that it has been doing so consistently for the entirety of Season 3? Really? If “luck” is the only explanation that even developers can offer for these kinds of streaks, then we must look elsewhere for an explanation, because “luck” does not change on a fixed schedule.


I’m sorry to go off like that, but I’ve kept the text as written because I really think it needs to be said: The matchmaker is unequivocally broken, and we need to stop telling people otherwise. The evidence is absolutely there. If this is not enough, then what is?

4 Likes

Was the hero one tricked for all 202 games as a control, or were different heroes used?

Bottom line is, you can’t just play the game and trust it.

You have to dance around the matchmaker.

That means the matchmaker itself is too big of a factor. It should feel non-existent.

It doesn’t. It’s rigged.

2 Likes

From a podcast I know they already have these stats (how “balanced” matches actually are) but don’t include it in the rating of a player.

This type of let’s call the “difficulty pulse” I also noticed. The truth is that if the matches are not challenging for the player then the players will simply lose interest and at the same time if the challenge is too high for an extended period of time it will lead to “burnout”. My observation is that the algorithm will at some point substitute higher ranked players into your role on the opposing team. If you’ve made progress, you should be able to overcome this “difficulty pulse” hill. I don’t know what consequences it has further, but I think that then this system starts to fulfill a second role, namely your MMR increases.

My ability to concentrate is wobbly and when I regain my focus these hills of pulse I manage to win, so it’s not really a “loser queue” just simply matches of a higher level than those at the bottom of the pulse. Due to my health, I have gone through it many times and always managed to return to my place without any major problems. So personally I think the system does its job and I’m glad it works the way it does now because I used to be permanently lock at a low rank and as a result I stopped playing because I was just bored of the game. Soon I will start a new treatment that should stabilize me and improve my results, then I will write about my observations and maybe together we can solve the mystery. It’s good to know there’s someone with an analytical approach here

The role delta idea is better but it is not a solution imo.

Very rarely are you going head to head with JUST your equal matchup. Its not a 1v1 game.

You’re fighting the whole team or often someone of another role that counters you or the other way around.

An example is being a gold support going up against a masters genji…genji would rarely fight the other other dps especially when the game doesnt make it clear which dps is his delta match.

Genji is going to make genji plays which is diving and assassinating no escape-skill supports.

He (doesnt have to be genji, just an example) could absolutely donimate the low mechanical skill supports and remove the healing from the enemy team and snowball the fight.

3 Likes

It is interesting that you put so much effort into this.

But anyone that had played the game for a long time and is not the top 0.5% of the player base is already well aware of this.

They tinker with MMR in the background once in a while and then you move along the ladder.

It has so little to do with your actual play and so much more to do with your hidden MMR vs your visual ranking.

If your MMR is out of sync to your SR… the match maker gives you game designed to either de-rank or up-rank you…

This was extremely apparent in recent weeks when they monkeyed with MMR and many went up multiple ranks quickly… then in a “stealth” change…deranked everyone because the top 0.5% were complaining about “rank inflation”.

Basically, your visual rank is EXTREMELY contrived.

Overall their MMR evaluation of your skills is not bad at all. But the way the game upranks or deranks you based on recent past performance instead of simply using SR and win/lossed moving you organically along the ladder is very upsetting to many players.

The ultimate outcome would be the same arbitrary ranking. But they want to speedily get you there. Which actually has reduced smurfing.

The problem is that this design would work well if there was only 1 hero and 1 powerset to rate your performance on. And that is simply not how Overwatch works.

Basically, by trying to micomanage the ladder distribution, they have shot themselves in the foot. And nobody is getting paid to make the obvious and truthful observation that the system is FUBAR’d at this point.

1 Like

Yes and no. Yes, because indeed, it doesn’t uses “only” winrate. But no, because that predicatbility tries to avoid wins streaks/loses streak.
Also, we must be careful to not mix “matchmaker” and “MMR”. MMR is based on your winrate. Matchmaker looks at your MMR and uses “predicatbility” and “uncertainty” to try to get you more opponents that just people who has the closest MMR.
FRom Eskay stream, one dev took an example of 9 GM1 players queuing, 1 GM and 9 GM5. He says that intuition would match GM3 with 9 GM1s to get more balanced but they want the matchmaker to set up a “more interesting” match so it mixes everyone.

Okay, its your opinion and I hear that. Personally, I disagree but its a whole design discussion and I know that games are designed to feel good nowadays rather than be fair and square. I understand that point, I just stubbornly disagree and I’m probably wrong.

Yes. Look at people doing “bronze to GM” streams. By looking at the levels of opponents, you can tell their MMR is going up. It makes sense that Matchmaking would put you with better opponents multiplicatevly to avoid people deserving Master to win 100 games to get out of Diamond. Its also something devs mentionned on Eskay stream ; that MMR didn’t move quick enough. That can be understood in 2 ways : either matchmaker didn’t put with high enough opponents or you wouldn’t get enough points by beating much better opponents.

I might be expressing myself in the wrong way. Uncertainty is used for matchmaking, not for MMR calculation. Its probably something like “target Opponent MMR = your MMR ± uncertainty ± predictability” (or uncertainty and predictability might be the same variable, just having 2 semantic uses).

About MMR Decay, I have no idea of the minimum time triggering decay. Nowadays, I play with a span of 5 or 6 days. I don’t feel my MMR decays or it quickly goes up to normal, after few games.

That doesn’t mean they’re right. I mean, MMR could reflect people performance but as nobody knows their own MMR… how can they assume ? There might be player with a plat MMR with the game showing Diamond Rank ; how would they know what rank they actually deserve ?

From the same blog post :

What your line means is that the more one plays, the more accurate MMR should be and eventually SR and MMR would be close to each other.
If you read the forums, you’ll see a lot of players telling their SR doesn’t move even after a 5/0 (or 7/0 for previous seasons) or people saying their rank moves even after 50% winrate.
During season2, my rank stayed the same with a 40%-45% overall winrate.

How SR is calculated is a mystery. MMR has an impact on it but it seems that stats within a game do have an impact too. And imo, that’s why they gonna change it in S4, for MMR being more important rather than the stats.

Yes. Also devs told the matchmaking is allowed to get players one division higher and lower than you. So a Plat player can be matched with Gold and Diamond. It might be too large but its not a “bug” from matchmaker, just a debatable “flaw” in the design.

So I’d ask how your graph is calculated cause if you have 0.5 winrate at the end, why your graph goes under 0.5 ? Why your graph starts at 0.3 when your first win is your 6th game ? It must be 0.15 then. I’m really not sure how you got that graph.

Also MMR is a result of your overall performance ; there’s no reset (unless decay kicks in) so with a 0.5 winrate overall, one can assume your final MMR is relatively close to your starting MMR. Of course there’s a Delta coming from the variations of earnings/losing quantities relative to your opponents MMR but we have no clue how big that Delta could be.

Beware of circular reasoning. Also, people doing “Bronze to GM” stuff show that hypothesis ain’t true.

First, your graph contradicts it : your 25 first games shows an increasing winrate. Same for the 25 following games. Also the big climb starts before game 75 and ends after game 100.

But lets analyze that 25 games periodicity :
1st part : 9wins (.36 winrate), all games are played past 22pm.
2nd part : 13wins (.52 winrate) 1 game in the morning, 6 games (at 0.5winrate) in the afternoon, the rest during night.
3rd part : 8wins (.32 winrate) 4 games in afternoon (all defeats), rest during night.
4th part : 16 wins (.64winrate) 5 games before 12pm (.8winrate) 5games in afternoon, before 8pm (.6 winrate), rest during night
5th part : 14 wins (.56 winrate) 6games between 8pm and 22pm (.5winrate), rest during night
6th part : 17 wins (.68 winrate) 1 game in afternoon (defeat), 1 game before 10pm (win), rest during night
7th part : 10wins (.4winrate) all games during night
8th part : 14 wins (.56winrate) all games during night
(The average of winrates is .505 as I didn’t count the 2 last games to stick to 25 games periodicity)

We can see here that there’s no “lose/win” periodicity as winrate is over 0.5 from period 4 to period 6 included.

About the updates, the 2 updates are client side, but matchmaker si a server-side thing.
According to twitter, they updated matchmaker March 23th :

So its really hard to know how many updates have been released to matchmaking and therefore, its really hard to know if a win/lose streak comes from the update kickin in, or a remaining flaw in the matchmaking system.

To be clear, I’m not defending matchmaker. My critics are on the way of analyzing datas. There are problems with matchmaker but its just difficult to tell what they are because there are a lot of things thant can change the outcome of our games.

Here, as the matchmaker has been updated several times, we can think that your games BEFORE the update can show a problem that has been fixed since. I.e. devs mentionned there was a bug causing MMR to be lower than it should be for many players. (from Eskay stream) and that led to the feeling of MMR “inflation” during season 3. Devs explained that MMR aren’t inflating but they’ve grown cause a previous bug prevented them to.

The fact teh devs probably updated the matchmaker more than the client side decrease the meaning of one player history ; since they might fix a problem seen on thousands players, your history can’t keep up with the updating rythm and it gets harder to see evidence from it. There might be evidence in your history, its just it’s hard for us, the players, to see it and to know if the problem has been fixed yet or not.

It contradicts the idea that matchmaking tries to aim at 0.5 winrate at all time. But let’s see :
https://postimg.cc/K3R5d6Sp
Average winstreak : 1,906. Average losing streak : 1.87
Median for both : 1
So that means you have longer winstreaks than losing streaks and most of them are very low. You have only 2 streaks at 10. 2 streaks at 5, 4 streaks at 4, 13 streaks at 3, 29 streaks at 2, 58 at 1.
Are you sure that the matchmaker does produce streaks ? I mean, 82,07% of streaks are equal or under 2.

Now on a more personal note :

  1. It doesn’t mean matchmaker works perfectly fine. It means the problems may lie beyond what we could figure out from the datas. We might think the earning/losing points at each match could cause your MMR to be in the wrong place or the matchmaker doing weird games.
  2. It doesn’t mean that your feelings aren’t legit.
  3. It doesn’t mean I am right and you are wrong. I just wanna change the approach of analysis to help figuring out if we can detect a matchmaking problem through data. I just want to be sure what to charge the matchmaker with.
  4. The 10 games long streaks are a problem and the feel of winrate bouncing from weeks/number of games is a problem. Especially the winstreak as it happens “late” in the season so yeah MMR shouldn’t move that much at this point imo.

Some other areas might need investigation : like winrate per day of week. Maybe Week-end population can have an impact on winrate/matchmaking. Same for hour of day.
Maybe those stats would reveal something interesting about population of players causing the matchmaking to act differently ? I.e. I feel the end of season 1 and 2 were “easier” and my rank has reached its peak in the last week of each seasons. Pretty sure it’s because of some players stopping playing Competitive at the end of seasons.
Also I feel there are “hours” I could feel the level of my games dropping. Its a consistent feeling I have but I can’t tell if its coming from the game (player population or matchmaking) or from my own fatigue (especially after 11pm).

You played hundreds of games but its not known how many games your mates have been playing. With time of playing MMR should stick to your deserved value, but what if people around your MMR have played like 10 games and get matched with you ?

But again, that doesn’t mean matchmaker works fine and you’re being delusive. I just want to examine the problem from all sides to know exactly what’s wrong with the matchmaking system.

PS : I’ve put your data in a way to see your MMR variation from streaks of win and loss, considering no variation in earning/losing points at each match (and I forgot to save it…) but there was an interesting thing : before your 10 win streak, you was under your initial MMR (you’ve been losing more than winning). The 10 win streak happened march 5th. The update log popped March 7th but I don’t remember the exact time of the update so it might be it. Maybe the 10 win streak comes from the matchmaking update. If so, it’d mean there was a problem in matchmaking, keeping you under your starting MMR.

That is a decent example. But much more rare than the tank vs supports.

A Diamond tank paired with Silver supports is going to feed and die until they adjust their playstyle. AND this happens far more than a dominant DPS taking over the lobby.

3 Likes

Today I learnt that CSV stands for comma-separated values.

Oh here we go, someone took ONE Comp Sci class and now thinks they’re Gods gift to the matchmaker

in case I need to spell it out to anyone, I didn’t read that

An excellent question!

I should have included this in the original write-up – though I’m hesitant to edit retroactively, because I would like to keep the post un-edited so as to preserve the provenance of the dataset.

To your question: I predominantly, but not exclusively, played Kiriko. However, the distribution is heavily skewed toward her: Season 3 was well over 30 hours on her, and 2 on her closest competitor. (I don’t have the exact figures in front of me at the moment.)

For the purposes of experimental design, a single character might be preferable, because it’s one less source of variation. However, there are times it’s not tractable, for instance because someone else has already picked that character. Other times, I switched because the game seemed to indicate it, e.g. to Brigitte, to deal with aggressive flankers.

Likewise, you could make a convincing argument that character choice is an aspect of skill at the game, and that choosing a single character might confound the true win rate.

At any rate, I can safely say that the trends observed in the chart do not correspond to changes in choice of character – but this is a good thing to consider.

Strongly agree. The fact that we are all talking about the matchmaker so extensively is itself a symptom of its failure.

Blizzard has so much data that I wish we could see.

This is an interesting hypothesis – but how can you tell that a higher ranked player is in your role on the opposing team?

Definitely true that it’s not a “loser queue”, and that term is misleading. The reason I still use it here and there is that it reflects the problematic aspect that has so many people upset: the prevalence of winning and losing streaks makes matches feel predetermined, even where they’re not. The metaphor of being dropped into a doomed queue appeals to the imagination – even if it’s not quite accurate.

This is a pretty interesting report in its own right, and is one of the only defenses of the current queue that I’ve seen and that I actually believe. I think these “difficulty pulses” might play better if we knew we were facing them, and they weren’t just materializing as a ghostly aura of ill fate. Definitely agree that the game should shake up the matches now and then, so that players that might be stuck at a local minimum can get out of it.

I’ll definitely think about this point more.

I would look forward to hearing your thoughts on it. May you be well.

It’s always useful to learn a new initialism!

I have long suspected a problem like this one, but I have been hesitant to conclude it is the case without strong evidence that disentangles my own contributions to the outcome. Things have felt severely messed up for a while now, but I was personally unsure how much the perception of a problem might be exaggerated by my own negative affect.

Reminding myself of this fact is how I keep playing competitive.

This is an astute summary.

I appreciate your attention to precise language, and I would agree that I could do more to make this distinction clear in my exposition. In particular, if MMR is convergent and meaningful, then the behavior we’re seeing is due to exogenous configuration of the matchmaker (e.g. its preference for shorter queue times over better matches) – and that could be important to the discussion.

Unfortunately, these two concepts blend too easily together because we don’t actually know what MMR is or how it changes, which means that it becomes difficult to separate from other behaviors of the matchmaker and the queue population.

Nonetheless, thank you for this note.

Make no mistake – I’m definitely down for a good challenge, and that’s why I’m playing competitive. But it’s hard to call a game “fair” when extrinsic forces are skewing the outcome.

Okay, that does make sense; it means that winning streaks would follow a power law, which would be appropriate to the developers’ likely goals. Do you have a link to the specific reference?

I don’t think they are right either, but that doesn’t stop them from assuming that Overwatch 2 Competitive is the best of all possible competitive gaming worlds.

Can confirm, and have observed similar phenomena myself.

The numbers are a puzzle though; I’m not willing to completely dismiss the possibility that numbers do matter, but:

  • Blizzard seems to deny that match stats are used in the calculation
  • Analytically, it would be very difficult to normalize stats across matches in a way that made them useful as a stable ranking statistic

One possible alternate explanation is that MMR increases and decreases are proportional to MMR differences across teams, and that match stats correlate with these differences. The problem with that explanation is that one would expect stats to be worse if playing against “better” players, and better when playing against “worse” players – which reverses the direction of the correlation. A twist on the same explanation might be that high stats coincide with “upsets” i.e. games in which a better-than-average performance leads to a win over a higher-ranked team.

However, I admit that I remain pretty uncertain on this point.

The graph shows a moving average. That means that the Y-coordinate of each point on the line represents the win rate over the M games that came immediately before it, for some chosen value of M. In this analysis, I used M=20, for reasons discussed in the original post. So, for example, if the chart shows 0.3, that means that I won 6 out of the last 20 games, because (6/20) = (3/10) = 0.3

The importance of the moving average is that it shows the shorter-term trend at each point in time (i.e. more wins or more losses) without erasing long-term variation (as a global mean does).

No, I don’t think there’s a contradiction. I’m not entirely sure how you’ve defined the “parts” here, but judging from the fact that there are eight of them, I would guess these correspond to the alternating periods of “predominantly winning” and “predominantly losing”. What the chart shows is that the value climbs (with some slight variation) for 25 matches, and then falls for 25 matches, and repeats.

You would be right to point out that the local maxima and minima of the resulting line are not strictly periodic – the local maxima and minima do vary somewhat in the values that they take.

But even your table in the above shows that even-numbered intervals exhibit a higher win rate on average, and the odd-numbered intervals exhibit a lower rate, which is exactly the sort of alternating behavior I’m calling out.

This is definitely true, and I did overlook that fact.

As for the date, though, March 23 is the last date in the dataset. Unfortunately, I don’t yet have enough data beyond that point to say anything of interest.

Criticism of the analysis is definitely valid and welcome, though I’m fully prepared to defend my conclusions

There definitely are ways that game outcome can change – and the data reflect this in the fact that the line seldom goes exclusively up or down. There are games here and there that seem go against the prevailing trend. But it’s important to point out that the trend persists anyway, and that the exceptional games don’t erase the average ones.

Returning to an example from earlier, both of these sequences represent a 0.5 win rate:

  1. WLWLWLWLWLWLWLWLWLWL
  2. WWWWWLLLLLWWWWWLLLLL

By most common definitions, Sequence 1 displays no streaks of any kind (unless you want to count a single outcome as a “streak”). By contrast, Sequence 2 exhibits two 5-match winning streaks, and two 5-match losing streaks. The overall mean erases that important distinction.

Is this plot generated from the original data? Could you explain what exactly it’s showing?

I was curious, so I ran the numbers. Here are win rates per weekday, where 0 corresponds go Sunday, and 6 corresponds to Saturday:

 	win_rate 	match_count
0 	0.681818 	22
1 	0.633333 	30
2 	0.428571 	28
3 	0.266667 	15
4 	0.440000 	25
5 	0.478261 	23
6 	0.491525 	59

There might be a few points of interest in this table, but I would have to look more closely before I felt like I had much to say for it. Sunday and Monday exhibit a much higher rate rate of winning, and Wednesday much lower – but I’m not sure we can conclude anything of direct relevance to the matters at hand.

I should also point out there’s a subtle flaw to grouping by calendar day, since many of the play sessions span the 00:00 hour.

If I were going to properly analyze this, I would use a proper kernel density estimator to cluster the observations into “sessions”, and analyze these – that could reveal something of interest, but I’m not sure if we have enough data to generalize a finding like that.

It is a good question though!


\0

1 Like

Well blizzard keeps telling that MMR doesn’t look at stats. My guess is that Skill Rating (displayed rank) does look at your stats. And this is why there’s a confusion between Skill Rating and MMR when people tell their rank during a comp match.

I don’t think devs would lie about MMR calculation, its just that they created 2 different values with 2 different calculations and one of the value is hidden and both calculations are mysterious.
A good example of that confusion was the start of season 2 : everyone have seen their SR (displayed rank) lowered but devs mentionned that MMR wasn’t changed. So basically, your rank was lower but the games were as hard as end of season 1. So your winrate basically was remaining the same but your rank climbed still to reach the season 1 value… Confusing as hell.

The good news is that devs are changing those things from season to season… but it doesn’t help people to figure out how their system works if they change it each 2 months :smiley: I do hope in season 4, it will make more sense.

Actually, I(ve just split your 200 games in 8 “periods” of 25 games to check if the matchmaker would enter into win/lose mode each 25 games. So I split games results in 8 parts and counted the wins within each to see the winrate for each period.

If matchmaking trying to get 50% winrate all the time, the 2nd example shouldn’t exist especially that if you stop at any point in this sequence, the winrate would be different from .50 (unless the middle point of the sequence, of course).

Yes. I just counted the streaks and put them in a graph to make it visually more comfortable than a 50 lines table. Your first lose streak is 4 games long, your 2nd lose streak is 1 game, etc etc… Maybe I shouldn’t have linked the dots ^^" I was thinking it would be interesting to compare the streaks length with winrate to see if you was doing LLLWWW or LWLWLW.

Imo, its interesting to see that sunday and monday have much higher winrate, especially monday being the 2nd most played day. Saturday isn’t far beyond the other day but it seems your best 3 days are around the weekend. Especially since your games are past midnight, I guess monday games are still counting as end of week-end ?

Great writeup. Definitely curious to see if this would be similar say in my own games or in others who feel the streaks happen too frequently. I really appreciate your attention to detail and also keeping an open mind to those who may feel opposite you.

To your credit, character does not matter. Someone correct me if I’m wrong, but I recall In several interviews now the devs noted the MM NEVER takes character into account when making MMR. Mainly to avoid punishing one-tricks. Hence why sometimes you’ll end up in games where say there’s a Pharah one trick on one side and not a single hitscan player on the other. But the MM claims the game was even.

1 Like

It totally does. Take my DPS data. 40W vs 23L. For starters there is no 0.5 win rate. Secondly according to your hypothesis I should have had at least one losing streak and possibly two yet you can see from my data that I only had consistent wins. This is my main account which I have been playing since season 1 of OW1. It shows that your hypothesis does not stack up.