Evidence for the Failure of Competitive Matchmaking in OW2

tl;dr

  • In a large sample (N=202), data show that match outcomes run in alternating streaks.
  • These cycles of winning and losing display a marked intensity and regularity.
  • Players complaining of one-sided matches may be observing similar matchmaking behavior.
  • Protracted cycles of winning and losing diminish player experience of agency, which is incompatible with the premise of a competitive game.

“There are no winner or loser queues in Overwatch.”

Since its launch in October 2022, Overwatch 2 has been haunted by the specter of the “loser queue.” Like all conspiracy theories, the loser queue becomes a screen onto which to project grandiose fears, and a rug under which to sweep unflattering failures. A large contingent players active on the forums swear that exists, as their own experience of gruelling disappointment could only be explained by systematic manipulation. An opposing faction dismisses such claims out of hand, as nothing more than low-elo players rationalizing their personal failures. The Overwatch development team publicly states, unambiguously, that there is no loser queue. However, the chaotic launch of Overwatch 2 has burned through a great deal of its player base’s trust, encouraging even more fevered speculation.

Speaking as a professional with relevant experience, there is probably not a loser queue – but not for the reason you might think. A matchmaker might have “favored” and “unfavored” individuals lined up for matching, but it would be impractical and inefficient to segregate those two groups into separate queues. As such, Blizzard can truthfully, technically say “there is no loser queue”. Players who insist otherwise, however, are not really making a statement about the technical architecture of Blizzard’s matchmaking system. They are alleging that the matchmaker seems to match them unfavorably in a way that is conspicuous and recurring. That is an important distinction.

Bimodal Matching: Definition of the Problem

Data suggest that the matchmaker really does decide when it is time for you to win, and when it is time for you to lose. However, I don’t expect you to take my word for it. Let’s unpack what we would expect to see if the matchmaker was systematically misplacing players, and how empirical observations support that hypothesis.

Premises: How MMR Should Work

Match-making rank (MMR) is a quantitative estimate of each player’s expected game performance relative to other players in the pool. Unfortunately, MMR is hidden from players, and Blizzard does not disclose how exactly MMR is calculated. However, assuming that MMR correctly reflects player performance, the following should be true of your MMR:

  1. Given that your own performance is consistent and that the distribution of other players remains similar, your MMR should eventually converge to a more or less stable value.
  2. As your MMR nears its “true” value, the probability of winning a match against other players of similar MMR should approach 0.5.
  3. If your MMR over- or under-estimates your true performance, you will eventually be placed into matches against players that are, respectively, better or worse than you.

Matchmaking is a complex problem, and there is a lot more that could be said about it, but these premises are the most relevant to the discussion at hand. In particular, they support the arguments of those who dispute the existence of a “loser queue”. Assuming these all hold, then it follows logically that a “stuck” rank is a sign that the afflicted player has hit their true MMR and can advance no further without improving their performance.

Definition: How Matchmaking Might Fail to Work

There is some contention around whether matchmaking is “broken” or not, and that contention seems to stem from vague, idiosyncratic definitions of what counts as “broken”. However, there are patterns, and they do seem to indicate to real dysfunction. Some typical contentions:

  • “Every game I play is one-sided.”
  • “In the chat, other players said they were X rank, but I am Y rank.”
  • “I lost N games in a row.”

In all likelihood, these claims do reflect real events. However, they also represent isolated observations that are easily dismissed as cherry-picking. Players with a grievance insist that they know what they saw, and that a functioning matchmaker would not produce such outcomes. Opponents correctly point out that such isolated facts admit multiple interpretations, many of which explain away the allegations in terms of player performance. What is missing here is a falsifiable hypothesis – that is, a specific claim that could be unambiguously confirmed or refuted through observation.

To that end, I propose a criterion for whether or not the matchmaker is working “correctly”:

Every player’s MMR should stabilize after a reasonable number of matches, and the stable MMR should become apparent as a corresponding decrease in the incidence and duration of win/loss streaks.

This is a meaningful defintion, in terms of the understanding of MMR in the premises above. As a player’s MMR stabilizes, they should come to find themselves in matches that feel “close”, in the sense that they are being matched alongside and against players of similar performance. Visible ranks might not align (at least, according to Blizzard), and the “one-sidedness” of a game might be subjective, but if games are consistently close, in the sense of being competitive, in the sense of requiring significant and comparable effort from both teams, then that condition should make itself apparent in the history of match outcomes. Specifically, if games are close, that condition should manifest as a mixture of wins and losses is not too imbalanced at any time scale. Succinctly: if the matchmaker is working well, long streaks should be sporadic and rare. This claim is quantifiable and testable.

Now let’s look at some quantitative data that puts this claim to the test.

Winning-Phase and Losing-Phase: Evidence of the Problem

We expect a well-performing matchmaker to eventually produce a healthy mix of wins and losses. By contrast, a poor matchmaker will produce relatively long and frequent streaks of wins and losses, as it struggles to estimate MMR and over-corrects for errors. I will refer to this second behavior as bimodal matchmaking. It is bimodal in the sense that it has two modes: one which places the player in favorable matches, resulting in more wins, and another which places the player in unfavorable matches, resulting in more losses. In other words, we would expect to see exactly the “easy-mode” and “hard-mode” phases of matchmaking that many players have reported.

As such, a sufficiently long record of match outcomes should offer evidence of how well the matchmaker is actually working. Let’s see what the data says.

Methodology: Data Collection

In a previous write-up (“Poor DPS Performance Predicts Match Outcome”), I presented data collected by recording the outcome of Competitive matches that I played during Season 3 of Overwatch 2. Specifically, I recorded the outcome of every match that I played as Support in the Competitive Role Queue. After publishing those findings, I expanded the dataset by continuing to record the same observations. The resulting dataset (N=202) consists of matches that took places between February 07, 2023 (the first day of Season 3) and March 23, 2023. The data were collected from matches played on a Playstation 5, located in the continental United States. My visible rank during this interval began at a low of Bronze 5, and gradually climbed to Silver 3. Outcomes were recorded immediately after the conclusion of a match.

See the Appendix for a copy of the original data, presented in a human- and machine-readable format.

Methodology: Analysis

In order to determine the presence of bimodal matchmaking, I calculated a moving average of the observed win rate, with a window size of 20 matches. I choose a window size of 20 because this is the smallest number of matches sufficient to span the largest number of games required to trigger a recalculation of the visible competitive rank. That is to say: visible rank is recalculated after 5 wins or 15 losses, whichever comes first, which means that calculation of rank might not happen until after a mix of 4 wins and 15 losses, or 5 wins and 14 losses. As such, at least one rank recalculation is guaranteed to take place in each 20-game window.

A moving average of the win rate will necessarily reveal the presence of streaks where a winning mode or a losing mode dominates. Streaks of winning will appear as intervals on which the measurement increases, and streaks of losing will appear as intervals on which it decreases. If streaks recur, the moving average should exhibit corresponding periods of rise and fall. Likewise, if such streaks are absent and sporadic, then the moving average should exhibit periodicity and trend similar to white noise – which is to say, it should exhibit no trend, and no discernible periodicity.

Importantly, a moving average quantitatively reflects the degree to which it “feels” like matches are mostly winning or mostly losing. When evaluating how a play session is going, it is common practice to look back at the outcomes of the last several games, and a moving average simply extends this sort of evaluation to each point in a series.

Findings

Before reading on, please see the plot of the data behind the following link:

https://i.postimg.cc/tgFkDKWT/overwatch2-season3-matchmaker-moving-average.png

As exhibited in the plot, the data under consideration exhibit high variance. The moving average win rate reaches a maximum of 0.8, and a minimum of 0.2. This means that at the peak of my performance I won 16 out of the preceding 20 games, and during my worst run I lost all but 4 of the preceding 20 games. All of which is to say that the period under study encompassed protracted periods of frequent wins, and protracted periods of frequent losses. In spite of this wide variation, the mean win rate across taken across the entire data set was observed to be exactly 0.5.

Much more significantly, the time series data exhibits a marked periodicity. The win rate does not merely go up and down – it does so with striking regularity. Win rate reaches a local peak near 50 matches, then peaks again at 100, at 150, and at 200. Likewise, the rate of losses peaks near 25, then at 75, 125, and 175 matches respectively. While these local maxima and minima vary in magnitude, the data clearly show that the balance of wins to losses reverses polarity every 25 matches, and that this effect persists over the course of 200 matches spanning 7 weeks.

These two findings support the original hypothesis: long streaks of winning and losing are present, and they recur with unmistakable regularity. Matchmaking is, in fact, broken.

Discussion: “Matchmaking is rigged.”

I have tried to maintain a restrained and relatively neutral tone, but at this point I must speak plainly: The evidence against the matchmaker is damning.

While I have generally agreed with the contention that matchmaking is dysfunctional in Overwatch 2, I have also been skeptical of claims that the matchmaker was systematically skewing the odds for or against players. Many of these claims seem to rely on speculation, rumor, and suspcicion, and most are impossible to positively affirm or conclusively refute. Apologists for the matchmaker are correct to observe that accusations against the matchmaker also make convenient cover for personal shortcomings: Memory often selects for the most extreme events or for those that best fit preconceived narratives, and humans show a strong proclivity for finding patterns where there are none. There remains the prevalent and very strong sense that Overwatch 2 just feels frustrating, but it is a frustration that stems from causes that are hard to identify, and for which it is all too easy to blame one’s self or one’s team mates.

Having said all that, there is only one plausible explanation I can see for the data at hand: the Overwatch 2 matchmaker systematically forces players into losing games, and does so with astonishing consistency.We cannot determine from these observations alone whether that behavior is accidental or designed, but it is clear that there is some calculation important to matching that is triggered every 25 games, and that the outcome of that calculation strongly characterizes subsequent matches. All other explanations strain credulity. Random effects can and do produce repeated outcomes like streaks of wins and losses, but random variation does not produce streaks of such striking regularity. Individual player performance really can tip the outcome of a match from loss to win, or from win to loss, but sustained efforts should produce a gradual trend upward or downward, not runs of predictably alternating outcomes. It is true that one can find spurious patterns in random signals, but the preceding conclusions proceed from a straightforward calculation on clean data, using standard analytical methods. Whether one attributes the matchmaker’s failures to manipulation, incompetence, or to the sheer difficulty of the problem space, the fact remains that is forcing losses.

The costs of this failure are considerable, in terms of toxic affect and player disengagement. It’s a well-established scientific fact that mammalian animals, humans included, experience dramatically heightened stress in situations of high uncertainty and low agency. Unfortunately, Overwatch 2 consistently creates both conditions, by presenting long runs of matches that feel unwinnable, while subtly but systematically hiding information about factors that are highly relevant to explaining these losses. These findings raise uncomfortable questions about the costs of hiding player ratings and match histories in the Overwatch 2 UI. The forums have featured a steady chorus of players professing extreme frustration, some to the point of clinical depression, and it seems plausible that this trend is driven in no small part by a matchmaker that presents games as “competitive” and “fair” which are anything but. Likewise, it seems reasonable to ask how much bimodal matchmaking has worsened the burden of toxic behavioor, rage quitting, and thrown games. While one could object that this is “only a game” and that frustrated players should just walk away, one should also consider what that means for the future of Overwatch if it becomes a large-scale trend.

Conclusion: Look Beyond Rank, or Walk Away

These observations leave us to choose between only two logical conclusions. Either:

  1. MMR accurately reflects expected performance, in which case the matchmaker is systematically mismatching players

or

  1. Players with wildly mismatched MMRs have credible chance of upsetting expected outcomes, in which case MMR is meaningless

Neither conclusion bodes well for the experience of competitive Overwatch. Either we must accept that our ranks are accurate but that we will be dropped into wildly mismatched games anyway, or that the ranks we are supposedly working toward are empty signifiers with no clear relation to player performance. Neither premise is compatible with a game that styles itself as “competitive”.

I began playing Overwatch in Season 3 of Overwatch 1. I continue to play because I love it. It is a game that is exciting, varied, subtle, and complex. I play Competitive because I like the premise of tackling a challenge, measuring my skills against those of others, and looking for opportunities to improve. I did notice that things felt markedly different in Overwatch 2, but initially shrugged it off as a combination of bad luck, personal performance, and pessimistic thinking. However, I could never completely shrug off the sense that something really was wrong, and that it wasn’t just me. That’s why I began to keep data on my matches: because I wanted a firm record of what I had seen, so that I didn’t have to second guess my memories or intuitions, and so that I couldn’t be gaslit into blaming myself for dysfunctions that seemed to lurk just outside the field of perception.

One reason I have continued to play, in spite of all that has been discussed here, is that I just wanted to get to the bottom of what was happening to Overwatch, and continuing to play gave me that chance. At soe point, though, I will have drawn all the conclusions I can. What then?

I know a lot of people have been asking themselves a similar question about their relation to Overwatch: It’s a mess, so what now?

I have been able to continue playing, and to enjoy it, but only by accepting that my rank will go up or down and there’s only so much I can do to control that. I focus on my own performance, in terms of individual mechanics and in terms of synergy with my team. I find satisfaction in those times when I do my job well. The fun of the thing is in the art of doing it. The ranking is just a number, and it’s going to vanish from existence some day when all the servers power down for good. It’s not worth the anguish or the rage.

But I acknowledge that this orientation may not work for some people. In that case, I can only offer this: If you’re no longer having fun, you need to stop playing. That’s it. That’s the whole solution.

Regardless of whether you stay or go, though, I think it’s fair to demand that Blizzard make things right. I do want to give them the benefit of the doubt; there are still many things to love about the game, and the task of designing and stabilizing a system of such breathtaking complexity is not easy. Even so, I think it’s fair to say that Blizzard has been less than transparent or trustworthy in acknowledging and addressing the failures. Negative sentiment is at an all-time high, and the game appears to be hemorraging players. I only hope that they are at least as aware of this problem as I am, that they are aware of the seriousness, and that they are working to fix it.

Whatever you choose, I sincerely hope that your time brings you joy. Thanks for reading.

\0

Appendix: Original Data

The original data are formatted below in comma-separate value (CSV) format.

The analysis above uses only the outcome column. However, dates, times, and maps are also included as possible points of interest.

----- BEGIN CSV -----

date,stop_time,duration,outcome,teammate_left,enemy_left,poor_friendly_dps,poor_enemy_dps,map
2023/02/07,23:33,16:04,L,1,0,0,0,Paraiso
2023/02/08,0:26,10:08,L,0,0,0,0,Antarctica
2023/02/08,0:43,6:58,L,1,0,1,0,New Queen Street
2023/02/08,0:59,11:05,L,0,0,1,0,Paraiso
2023/02/08,1:18,14:47,L,0,0,0,1,Blizzard World
2023/02/08,22:36,18:00,W,0,0,0,0,Havana
2023/02/08,22:58,18:24,L,0,0,1,0,Paraiso
2023/02/08,23:16,11:51,W,0,0,0,0,Shambali Monastery
2023/02/08,23:30,9:55,W,0,0,0,1,Ilios
2023/02/08,23:41,7:17,W,0,0,0,1,Junkertown
2023/02/09,0:29,11:13,L,0,0,1,0,Lijiang Tower
2023/02/09,0:44,10:36,L,0,0,1,0,Havana
2023/02/09,1:03,12:42,L,0,0,0,0,Blizzard World
2023/02/09,1:14,5:59,W,0,1,0,0,Antarctica
2023/02/09,1:21,10:21,L,0,0,1,0,Colosseo
2023/02/09,1:52,18:27,L,0,0,0,0,Midtown
2023/02/10,22:18,6:07,L,1,0,1,0,Lijiang Tower
2023/02/10,22:33,11:04,L,1,0,1,0,Paraiso
2023/02/10,22:51,13:31,W,0,1,0,1,Midtown
2023/02/10,23:02,4:19,L,1,0,1,0,Esperanca
2023/02/10,23:23,17:13,L,0,0,1,0,Shambali Monastery
2023/02/11,0:55,10:48,W,0,0,0,0,Colosseo
2023/02/11,1:10,L,1,0,0,0,Blizzard World
2023/02/11,1:38,22:58,W,0,0,0,0,Rialto
2023/02/11,2:05,7:11,W,0,0,0,0,Nepal
2023/02/11,10:42,13:25,L,0,0,1,0,Dorado
2023/02/11,22:02:00,17:06,L,0,0,0,0,King’s Row
2023/02/11,22:56,11:29,L,0,0,0,1,New Queen Street
2023/02/11,23:14,13:40,W,0,0,0,0,Midtown
2023/02/11,23:23,6:09,W,0,0,0,1,Colosseo
2023/02/11,23:38,10:54,L,0,0,0,0,Esperanca
2023/02/12,0:38,12:37,W,0,0,0,0,Paraiso
2023/02/12,0:57,14:54,W,0,0,0,0,Havana
2023/02/12,1:15,13:11,W,0,0,0,0,Nepal
2023/02/12,1:27,7:10,L,0,0,1,0,Antarctica
2023/02/12,1:42,12:16,W,0,1,0,1,Esperanca
2023/02/12,1:56,7:03,L,0,0,1,0,Oasis
2023/02/12,2:18,10:26,L,0,0,1,0,New Queen Street
2023/02/12,2:39,17:57,W,0,0,0,1,King’s Row
2023/02/12,15:57,9:45,L,0,0,1,0,Blizzard World
2023/02/12,16:08,8:34,L,0,0,1,0,New Queen Street
2023/02/12,16:43,13:14,W,0,1,0,0,Paraiso
2023/02/12,17:08,20:15,L,0,0,1,1,Rialto
2023/02/12,18:27,16:26,W,0,0,0,0,Shambali Monastery
2023/02/12,18:51,20:04,W,0,0,0,1,Rialto
2023/02/12,23:58,6:03,W,0,0,0,1,Antarctica
2023/02/12,0:20,18:20,W,0,0,0,0,Havana
2023/02/13,23:35,10:37,L,0,0,1,0,Esperanca
2023/02/13,23:54,15:43,W,0,0,0,1,Paraiso
2023/02/14,0:33,6:37,L,0,0,1,0,Havana
2023/02/14,0:53,14:56,W,0,0,0,0,King’s Row
2023/02/14,1:01,10:41,L,0,0,0,0,Esperanca
2023/02/14,1:23,11:53,W,0,0,1,0,Ilios
2023/02/14,1:46,15:25,L,0,0,0,0,Junkertown
2023/02/15,23:59,11:43,L,0,0,1,0,Oasis
2023/02/19,0:41,5:12,L,0,0,1,0,Esperanca
2023/02/19,0:52,6:33,W,0,1,0,1,Numbani
2023/02/19,1:06,9:38,L,0,0,1,0,Ilios
2023/02/19,1:20,10:06,W,0,0,0,1,Oasis
2023/02/19,1:30,5:59,L,0,0,1,0,Midtown
2023/02/19,1:51,16:37,L,0,0,1,0,Dorado
2023/02/19,14:17,5:49,L,0,0,1,0,Esperanca
2023/02/19,14:45,24:48:00,L,0,0,0,0,King’s Row
2023/02/19,15:12,19:20,L,0,0,0,0,Midtown
2023/02/19,15:40,10:20,L,1,0,1,0,Paraiso
2023/02/19,22:43,17:30,L,0,0,0,0,Blizzard World
2023/02/19,23:08,17:31,L,0,0,1,0,King’s Row
2023/02/19,23:27,10:44,L,0,0,1,0,Circuit Royal
2023/02/19,23:27,7:57,L,0,0,1,0,Ilios
2023/02/20,0:12,8:33,W,0,0,0,1,Nepal
2023/02/20,0:32,14:14,W,0,0,0,1,Midtown
2023/02/20,0:48,13:17,L,0,0,1,0,Dorado
2023/02/20,1:06,10:18,W,0,0,0,0,Antarctica
2023/02/20,1:30,11:07,W,0,0,0,0,Colosseo
2023/02/19,1:53,17:01,L,0,0,1,0,Havana
2023/02/20,2:06,8:16,L,0,0,1,0,Oasis
2023/02/20,2:22,12:20,W,0,0,0,1,Ilios
2023/02/20,13:45,10:20,W,0,0,0,0,Colosseo
2023/02/20,14:04,13:11,L,0,0,1,0,King’s Row
2023/02/20,18:40,8:29,W,0,0,0,1,Ilios
2023/02/20,19:01,13:30,L,0,0,1,0,Nepal
2023/02/20,19:36,13:27,W,0,0,0,1,Havana
2022/02/20,22:12,4:55,W,0,0,1,1,New Queen Street
2022/02/20,22:24,7:59,W,0,0,1,1,Junkertown
2022/02/20,22:39,8:17,L,0,0,1,0,Ilios
2022/02/20,22:48,6:16,W,0,0,1,1,New Queen Street
2022/02/20,23:09,8:26,W,0,0,0,0,Lijiang Tower
2022/02/20,23:27,11:43,W,0,0,0,1,Paraiso
2022/02/20,23:42,12:07,L,0,0,1,0,Nepal
2023/02/21,10:45,23:32,L,0,0,0,0,Rialto
2023/02/21,10:59,6:43,W,0,0,0,1,Antarctica
2023/02/21,11:24,12:18,W,0,0,0,1,Havana
2023/02/21,11:42,10:55,W,0,0,0,1,Colosseo
2023/02/21,11:59,12:15,W,0,0,0,1,Oasis
2023/02/21,22:39,23:23,L,0,0,0,0,Shambali Monastery
2023/02/21,23:00,8:15,W,0,0,0,1,Circuit Royal
2023/02/21,23:16,12:23,W,0,0,0,0,Ilios
2023/02/21,23:29,7:09,W,0,0,0,1,Junkertown
2023/02/21,23:44,11:33,L,0,0,1,0,Oasis
2023/02/22,0:27,17:35,L,0,0,0,0,Dorado
2023/02/22,22:39,7:44,W,0,0,0,1,New Queen Street
2023/02/22,22:51,6:59,W,0,0,0,1,Junkertown
2023/02/22,23:16,10:41,W,0,0,0,0,Oasis
2023/02/24,1:34,6:00,W,0,0,0,1,King’s Row
2023/02/24,20:48,11:25,W,0,1,0,1,Havana
2023/02/24,21:06,13:36,L,0,0,0,0,Circuit Royal
2023/02/24,21:19,10:14,L,0,0,0,0,Lijiang Tower
2023/02/24,22:08,26:01:00,W,0,0,0,0,Dorado
2023/02/24,22:33,18:17,L,0,0,1,0,Blizzard World
2023/02/24,23:29,14:30,L,1,0,0,0,Antarctica
2023/02/25,0:05,12:03,W,0,0,0,1,Dorado
2023/02/25,0:24,12:27,L,0,0,1,0,King’s Row
2023/02/26,0:49,7:36,L,0,0,1,0,Lijiang Tower
2023/02/26,1:08,14:46,W,0,0,0,0,Shambali Monastery
2023/02/26,1:25,7:21,W,0,0,0,1,Nepal
2023/02/26,1:39,10:30,L,0,0,1,0,Colosseo
2023/02/26,1:59,13:58,W,0,0,0,1,Blizzard World
2023/03/01,0:15,11:41,L,0,0,0,0,Havana
2023/03/01,0:37,11:42,W,0,0,0,1,Rialto
2023/02/03,0:46,16:59,W,0,0,0,0,Blizzard World
2023/03/03,1:13,22:01,L,0,0,0,0,Junkertown
2023/03/03,21:09,12:54,W,0,0,0,0,Midtown
2023/03/03,21:23,10:01,L,0,0,1,0,Ilios
2023/03/03,21:38,10:44,W,0,0,1,1,Numbani
2023/03/04,0:46,8:17,L,0,0,1,0,Nepal
2023/03/04,0:57,6:47,L,0,0,1,0,King’s Row
2023/03/04,1:11,9:14,L,0,0,1,0,Lijiang Tower
2023/03/04,2:04,10:22,W,0,0,0,1,Esperanca
2023/03/05,1:57,8:13,W,0,0,0,0,Shambali Monastery
2023/03/05,2:16,14:54,W,0,0,0,0,Blizzard World
2023/03/05,2:29,8:30,L,0,0,1,0,Lijiang Tower
2023/03/05,2:45,12:15,L,0,0,0,0,Midtown
2023/03/05,15:31,21:20,L,1,1,1,1,Numbani
2023/03/05,23:17,19:14,W,0,0,0,0,King’s Row
2023/03/05,23:40,6:02,W,0,1,0,0,Paraiso
2023/03/05,23:57,12:59,W,0,0,0,0,Nepal
2023/03/06,0:17,15:41,W,0,0,1,0,Numbani
2023/03/06,0:32,10:31,W,0,0,0,1,Shambali Monastery
2023/03/06,23:34,11:30,W,0,0,0,0,Ilios
2023/03/07,22:29,10:40,W,0,0,0,0,New Queen Street
2023/03/07,22:42,8:03,W,0,0,0,1,Havana
2023/03/07,23:04,16:53,W,0,0,0,0,Paraiso
2023/03/08,21:51,13:39,W,0,0,0,0,Oasis
2023/03/08,22:20,18:28,L,0,0,0,0,Midtown
2023/03/08,22:51,12:19,W,0,0,0,1,King’s Row
2023/03/09,22:38,13:34,W,0,0,0,1,Junkertown
2023/03/09,22:56,8:33,L,0,0,1,0,Lijiang Tower
2023/03/09,23:31,10:44,W,0,0,0,0,Esperanca
2023/03/09,23:50,14:06,W,0,1,0,1,Circuit Royal
2023/03/10,0:06,11:44,L,0,0,0,0,Numbani
2023/03/10,0:18,6:15,W,0,0,0,1,Oasis
2023/03/10,23:20,17:13,L,0,0,0,0,Midtown
2023/03/10,23:26,5:48,W,0,0,0,1,Blizzard World
2023/03/10,23:50,7:10,L,0,0,1,0,Ilios
2023/03/11,0:09,15:41,W,1,0,1,1,Numbani
2023/03/11,0:43,16:51,W,0,0,0,1,Lijiang Tower
2023/03/11,22:57,16:34,L,0,0,0,0,Paraiso
2023/03/11,23:19,14:33,L,0,0,0,0,Junkertown
2023/03/12,0:26,11:45,W,0,0,0,1,Rialto
2023/03/12,0:41,10:31,L,0,0,1,0,Esperanca
2023/03/12,22:38,11:39,L,0,0,1,0,Blizzard World
2023/03/12,22:55,10:27,W,0,0,0,0,Colosseo
2023/03/12,23:43,15:23,L,0,0,0,0,King’s Row
2023/03/13,0:40,14:57,W,0,0,0,0,Oasis
2023/03/12,0:36,22:25,W,0,0,0,0,Junkertown
2023/03/14,0:41,6:08,L,0,0,1,0,Midtown
2023/03/14,1:06,6:40,W,0,0,0,1,Rialto
2023/03/15,0:30,5:50,L,0,0,1,0,Nepal
2023/03/15,1:03,16:52,L,0,0,0,0,Dorado
2023/03/15,23:09,12:44,L,0,0,1,0,Ilios
2023/03/15,23:32,12:37,W,0,0,0,0,Midtown
2023/03/15,23:43,7:22,L,0,0,1,0,Antarctica
2023/03/16,0:12,12:19,L,1,0,1,0,Oasis
2023/03/16,23:25,18:10,L,0,0,0,0,Blizzard World
2023/03/17,23:45,10:00,L,1,0,0,0,Junkertown
2023/03/17,0:08,10:58,W,0,0,0,0,Oasis
2023/03/17,0:30,16:25,W,0,0,0,0,Shambali Monastery
2023/03/18,0:00,8:58,L,0,0,0,0,Lijiang Tower
2023/03/18,0:12,6:05,W,0,0,0,1,Paraiso
2023/03/18,0:41,16:41,W,0,0,1,0,Rialto
2023/03/19,1:10,21:53,L,0,0,0,0,Blizzard World
2023/03/19,1:33,12:06,W,0,0,1,0,Nepal
2023/03/20,0:06,18:57,W,0,0,0,0,Numbani
2023/03/20,23:14,22:19,L,0,0,0,0,King’s Row
2023/03/20,23:27,8:36,L,0,0,0,0,Circuit Royal
2023/03/20,23:43,10:14,W,0,0,0,1,New Queen Street
2023/03/21,0:00,11:27,W,0,0,0,1,Paraiso
2023/03/21,0:21,11:03,L,0,0,1,0,Esperanca
2023/03/21,0:40,16:06,W,0,0,0,1,Blizzard World
2023/03/21,22:36,18:46,L,0,0,0,0,Havana
2023/03/21,22:53,14:29,L,0,0,1,0,Numbani
2023/03/21,23:20,7:27,W,0,0,0,1,Shambali Monastery
2023/03/21,23:44,9:37,W,0,0,0,1,Paraiso
2023/03/21,23:45,6:12,W,0,0,0,1,Antarctica
2023/03/22,0:00,8:36,L,0,0,0,0,Rialto
2022/03/22,0:25,19:03,W,0,0,0,0,King’s Row
2023/03/22,23:01,10:57,L,1,0,1,0,Ilios
2023/03/22,23:21,10:00,L,0,0,0,0,Colosseo
2023/03/22,23:40,14:23,W,0,0,0,1,Dorado
2023/03/23,0:00,15:08,L,1,0,0,1,Oasis
2023/03/23,0:21,16:05,L,0,0,0,0,Circuit Royal
2023/03/23,0:46,12:29,L,0,0,1,0,Nepal

----- END CSV -----

56 Likes

Great post.

In the end, I’ve decided as a long time gamer… I can’t stand any MMR system.

Use Rank. It comes along with a bit of gaming and shenanigans from the PLAYERS, but I much prefer that over the shenanigans from the actual Matchmaker!

I mean, look at this game. They’re trying to use some super advanced system to match players… only to offer up one of the most frustrating, universally hated, ranking systems of all time.

So, is MMR any good? No. It’s horrible. People hate it, don’t trust it, and for good reason. The game would be better in the long run if they just turned it off tomorrow.

Use Rank. The end.

24 Likes

Thanks for such a detailed post.

5 Likes

I didnt read all of it, much thanks to the tl;dr - it sounds like you took my hot mess of word soup I usually say - mixed with some venting and formed it into a data backed with English language usage.

Thank you.

4 Likes

Great post.

I did a much simpler test by simply comparing profiles and see their rank. I exclusively played support, so I exclusively looked at dps and tank and assumed that I’m “afk”, so I blame every support diff on me being “hardstuck” and “absolutely bad”. I also only looked at insane outliners, like playing against top500 dps vs a hard stuck plat dps on my team. At the end I simply compared dps and tank positioning, aim, and overall game sense.

The funny thing is in the last match I had HARDSTUCK PLAT DPS and the enemy team had ONE TOP500 DPS in a Diamond match.

You can argue all day about “support diff”, but those dps were not equal, no matter how “hidden” the real mmr is. And no, the top500 dps wasn’t a Mercy main and played like a real top500, while my dps really played like hard stuck plat.

And for Blizzard, here is the code of the rigged match “GWQCF5”. If you really want a fair match making, fire the people responsible for this garbage rigged system, they literally can’t even include the official MMR (if top500 and hardstuck plat dps is the same in hidden MMR, then we have another problem that makes the people behind this garbage code look even worse) in their own magical fairy tale “delta”. This match should have never happened, no matter the end result of the match (which was, surprise surprise, a forced loss with the plat dps making all the mistakes at the end to lose the game). BTW, the match was only close because the enemy team wasn’t looking at cart, or it would have been over pretty fast.

So yeah. I have enough proof that the match making is rigged on my own and this post is using another method with the same result.

No matter what rigged “delta(?)” Blizzard is using for the match making, the match making is rigged and my last match alone proves beyond any reasonable doubt that (some(?)) matches are in fact 100 % rigged and matches are created that were always forced loss WITH BLIZZARDS OWN DATA THEY PROVIDE without any external factor like smurfing or some people having a bad day.

Overwatch is dead and it’s easy to see why. WHY should anyone play a game, where the player is forced to spectate a forced loss? It’s one thing to get smurfs or other external factors that can ruin a game. You can’t blame Blizzard for anything they can’t influence. But if Blizzard own system literally rubs in my face with the player profiles that I’m now hard losing, that alone is beyond disgusting.
That Blizzard openly rubs the force loss in your face with a top500 against a hard stuck plat in a ROLE EQUALITY QUEUE is killing the game. Literally all arguments are now gone that the match maker is “fair”, if Blizzards own system says that it’s a hard forced loss without even bothering hiding it.

12 Likes

ow2 is a weekend warrior cash grab now. no competitive integrity at all lol. GG BLIZZ GG

11 Likes

That’s very interesting. I wondered if there is a way to break the streaks early, but it seems the only way is to go through 25 matches.

I’m interested in tracking my matches for season 4 and doing some analysis on it. Maybe together with time of day + day of week info.

There is the first flaw in this analysis. If you really were a professional, you would know that you cannot take data from yourself. The chance of bias or unconscious manipulation of results is far too high, especially where you control the results. The chance of you being a professional and presenting this analysis is zero.

Your sample is actually a sample of 1 and not a large sample. Not enough to draw any conclusion from. I could show you my data from season 3 which is below which will totally contradict the data that you have compiled. Even then, it is only a sample of 2 if you include yours.

Heals, 5W-4L, 5W-2L, 5W-6L, 5W-5L, 5W-3L, 2W-5L
DPS, 5W-2L, 5W-7L, 5W-3L, 5W-4L, 5W-2L,5W-3L,5W-1L,5W-2L
Tank, 5W-5L,5W-11L, 5W-3L, 5W-6L, 5W-1L
Open Queue, 5W-7L, 5W-2L,5W-2L

Overall 110W-81L

I’ve been from silver through to diamond and the players in diamond are way better than the players in silver. Fact. You are not being held back by the matchmaker. Every 2 pips of a rank the players are noticeably better. The ranking is grading players according to their ability.

If you are a DPS/tank and want to get better play healer for a couple of seasons. You will notice that almost every death is the players fault. Sometimes you want to scream at them. They will just blame teammates and the matchmaker and vent here when the solution is entirely in their own control. I’d also recommend mystery heroes as it teaches you not to die.

5 Likes

if you want large enough sample, then maybe look at other players who complains about the same problem, and starts caring from there.
blaming the player alone wont help anyone. just like what the devs did.

2 Likes

I agree. My condition is not stable and it is clear that when I feel worse I start to lose and then when I get better I have a winning streak again. This is a pattern that repeats itself over and over again, so it’s not a one-time observation for me, it’s a rule. It is quite possible that losing streaks in most cases are caused by psychological aspects after winning streaks and players feel more and more pressure to keep the streak, which leads to a “try hard” and narrowing of perception and, as a result, worse results in the game. In addition, the so-called “tilt” after a loss interrupting the series of wins can deepen such a state of affairs. Criticizing and blaming others is a common way to deal with your mistakes, but sometimes it’s worth taking a break and calm down and watch the replay and see what you could have done better.

Ps. It’s possible that winning streaks cause players to stay in the game longer than usual, which can cause exhaustion and also contribute to losing streaks. My advice is to play it easy

1 Like

Thanks to everyone who replied! If you haven’t already, please make sure to take a look at the chart that illustrates the main finding:

https://i.postimg.cc/tgFkDKWT/overwatch2-season3-matchmaker-moving-average.png

(It’s unfortunate that the forums won’t allow embedding an image; having the main plot right in the middle of the text would have looked much better. As it is, the URL gets lost in the rest of the text.)

To the individual points:

This is one possible explanation yes, and I would not dismiss it out of hand, but it leaves two important questions unanswered:

  1. Why does the distribution reverse modes with such consistent regularity, i.e. every 25 games?
  2. Why does the effect appear even though every cycle spans multiple play sessions?

These are related, but let’s consider (1) first.

It is plausible to claim that players might get into a certain behavioral or psychological “groove”, and that the concomitant intensity burns out and leads to a crash. Certainly, there are many relevant variables that might drive player behavior in a way that could affect performance, and thereby outcome. The problem is that these effects ought to be local with respect to clock time, not with respect to match count. That is to say: If I burn out after an intense winning or losing streak, one would expect to be able to predict this outcome after a certain number of minutes or hours, regardless of the number of matches played. Attention definitely fatigues after some time, but attention is going to vary with time spent not matches played. However, the seasonality observed in the time series above is tied to match count, not to clock time. It would be extremely strange, to say the least, if I became fatigued or inspired at 25 matches consistently, when the total clock time to complete 25 matches varies as much as it does.

Point (2) is even more serious.

Here’s a plot from the dataset, showing how many matches I played per single calendar day:

https://postimg.cc/dk6nk5tP

One thing should immediately jump out: I never played enough matches in a single day to traverse a full 25-game cycle. Most days, I played less than six matches. If the 25-match effect is based upon some kind of player fatigue or intensifying emotional affect, how does that fatigue or affect carry over to a session that usually takes place about 24-hours later? What about the fact that I’ll be eating, sleeping, working, and doing things other than Overwatch in between? Never minding the fact that it would usually take multiple such sessions to complete a full cycle? To be honest, it seems far-fetched to propose that I stay so tired and angry between sessions that my losing streak can pick up right where it left off – and then end right on schedule.

Well you definitely shouldn’t believe everything you hear on the Internet, so I applaud your skepticism. However, I would invite you to return to the original post and read it more closely. The only point that I qualified with “as a professional” was a passing remark about how Blizzard could say “there is no ‘loser’ queue” and how that could be true in a very narrow technical sense. That point is not central to the analysis at all.

The point of the original post is that my complete match data for Season 3 showed cycles of winning and losing that repeated on a stunningly regular cadence, and that such a phenomenon simply should not arise if the matchmaker is working as advertised. The argument proceeds from logical premises and empirical data, and the original data is included in the original post if you’d like to run the numbers yourself. You could believe that I scrub toilets for a living and the facts would still remain.

This would be a sample of N=1 if my hypothesis were “are all Overwatch players experiencing this same periodic matchmaking behavior.” Now, I admit that is an interesting question, and I would love to know the answer. I agree there is reason to be skeptical that everyone is seeing this behavior. I’m playing on a rather old account, and a lot of anecdotal evidence suggests that the matchmaker is struggling in particular to calibrate MMR under this condition. Not everyone is playing an old account, and not everyone is playing from the same point on the MMR distribution, and both of these conditions probably matter quite a bit. While I think there probably are other players experiencing a similar phenomenon, I also think there are others who are not.

To the point: The hypothesis under consideration in the original post is, “Given this observed time series, how probable is it that a correctly functioning matchmaker would generate such markedly and regularly bimodal match outcomes?” If you’re familiar with statistical inference, then you’ll recognize this is perfectly analogous to measuring log-likelihood of a sample with respect to a hypothesized distribution. Of course, analyzing time series requires different analytical tools than those applied to unordered samples, because points in a time series are generally not independent or identically distributed (iid) – but that doesn’t mean analysis is impossible. In fact, folks do it all the time.

Since you seem interested in the particulars, let me break down the analysis even further. Because the original post was rather long, I worried folks’ eyes would glaze over if I also went into the details of hypothesis testing, but I’m happy to elaborate. So here we go:

I wanted to be sure that I wasn’t spuriously imposing imagined patterns to the series, so I followed the standard approach of de-trending the series and searching for possible seasonality. The original chart is striking – it’s what inspired me to write all this up in the first place, because I was shocked the first time I saw it – but I wanted to make sure that the visual intuition was reproducible from impartial quantitative analysis. I de-trended the series by taking a first-order difference, then tried fitting a sine function to the result, using scipy.minimize, parameterized over the frequency. Sure enough, 25 plus or minus a tiny epsilon popped out as the best fitting parameter. Assuming that the seasonality here is additive and not multiplicative, I subtracted the fitted sine function from the de-trended series, and applied the Augmented Dickey-Fuller Test (ADF) to the result, to test the hypothesis that only white noise remains after the trend and seasonality are removed from the original series – i.e. that seasonality explains most of the observed variation after trend is removed. The resulting ADF statistic comes in at about -2.6, with a P-value of 0.08, which is easily strong enough to support the hypothesis at hand, i.e. that the seasonality accounts for the most of the non-stationarity observed in the original series.

To the earlier point, this,

is untrue.

The original hypothesis technically hinges on the question of whether the time series is mostly white noise after the seasonality has been removed, and this in turn can be determined by testing for stationarity. The ADF statistic does this, and the numbers show that there’s only an eight percent chance we would draw a sample this stationary from a non-stationary series. A back-of-the-envelope calculation shows that a sample size of N=202 ought to be more than enough for 90% confidence with a 5% margin of error.

The data is up there if you’d like to counter with your own analysis.

Thanks for sharing your data, as it is relevant, and I would love to see more people sharing their data, as it gives us a better insight into what might be happening.

However, this data doesn’t “contradict” my data, because that’s not how data works. It might contradict a hypothesis or a claim, but by definition data is what it is. It would be interesting to compare your results, but your dataset is missing a key feature necessary to reveal the presence or absence of streaks: You have not included wins and losses in the order that they happened. As such, we can’t run the same moving average on your data as was run above, which means we can’t unambiguously compare the two.

I would also point out that your dataset is considerably smaller than mine. The Support data contains only 52 total matches, which would be only just large enough to maybe see the cycle turn once, if it is in fact present.

At any rate, you certainly can’t combine the roles into a single dataset, because the matchmaker uses different MMRs for different roles – combining them would mix unrelated distributions, and give spurious results.

Nonetheless, if you do have the original outcome data in the order the matches were played, I would love to see it.

I addressed this point above, but it’s worth asking again: Do you really mean to suggest that my ability abruptly reverses quality, on schedule, every 25 games?

Thank you, and you’re right!

If you do find a way to break the streaks, that itself would be very interesting. It would be a little tricky to tease apart from other factors, so it’s probably worth forming a hypothesis in advance. Good question though!

Likewise, if you do keep this data in the upcoming season, I would love to see it. It’s likely something will change with the matchmaker in Season 4 – but it’s anyone’s guess what, and we won’t know if we don’t keep the record.

You’re welcome, and thanks for reading.

Also thanks for reading!

Strongly agree. I can even see, in principle, some reasons that one might use some kind of hidden metric for matchmaking – but it would be for reasons like smoothing out wild oscillations in rank e.g. for a new account. Even then, I don’t think it’s strictly necessary, and it’s clear that the existing system performs very poorly. Regardless of whether or not it’s doing what it’s “supposed” to do, it’s clear that everyone is having a bad time and hates it, and that should be enough reason to ditch it.

Phenomena like these are really noteworthy, and go back to another related problem with the current system: Groups are hidden. Blizzard argues that they don’t want people to form preconceptions of the match, and thereby “give up early”. Even if that is a worthwhile goal, the fact remains that it hides a factor that would be highly relevant in explaining how the matchmaker is giving out such strange results, and in interpreting some of the ways that a match may have gone the way it did, i.e. if one team is being hard carried by someone way outside the MMR distribution of the rest of the match, that is highly relevant. Even just seeing that information at the end of the match would be worthwhile.

Could you elaborate on this a little bit? If you’re talking about looking at the profiles of other players, I admit that I’ve had poor luck with this approach, as most of the profiles I try are private.

5 Likes

One thought. The problem may be that the ranked system focuses too much on wins and not enough on what happens in the match. For example it treats an 0-3 blowout exactly the same as a tie match where you barely loose in overtime. I think a lot of players would be a lot less sour about getting losses if they knew they were still getting some credit for what they did during the match. Looking at what happens in game would also help the ranked system to converge on more accurate ranks faster because it is making use of more information and it would somewhat decrease the impact that being placed on a “loosing team” has on your rank.

3 Likes

Great post, add this to the list of things to provide next time someone says there’s no proof or “where your evidence?”

Not that we need this post to prove the matchmaker is bad since that proof is there through countless thousands of eye witness accounts, but this is a really great post to have.

2 Likes

They told its based on winrate meaning you gain points when you win, you lose points when you lose, no matter the stats within a game. Iirc they mentionned the gain/loss quantity of points depends on the MMR of your opponents. I guess they can’t give you exact calculation as the points you’d earn/lose after each match could be different.

Iirc devs told the matchmaker uses a “predictability” value. And it makes sense actually. If you win, your MMR would increase and there’s a question coming to matchmaker mind : are you under your “real” MMR or not ? So probably the more you win, the stronger opponents you’ll face with an multiplicative increase.
That helps to put people close to their “true” MMR quick enough.

Devs told on Eskay stream that there’s also a “uncertaintity” value attached to MMR. For what I get, it grows when people don’t play for a long time (so probably bound to MMR decay).

The problem is that we dont know when “predictability” and “uncertaintity” kicks in and how hard do they kick.

As MMR is hidden, I’m not sure how people can say that “MMR reflects performance”.
It merges with people telling their ranks in a game : its pointless as Skill Rating and MMR are 2 different numbers and MMR is hidden. Rank displayed isn’t and should stop being taken as a reflection of MMR / matchmaking.
Devs tell Season 4 will fix that by making SR and MMR getting much closer to each other so we can be surprised by some SR jumps/drops in S4.

I’m sorry, I might not understand and I might be wrong but I imported your data into a text editor : you have won 101 games and lost 101 games. It makes a perfect 50% winrate from your 202 games.
If we look at the winrate graph, well its easy to see you have been over 0.5 winrate for a longer time than under 0.5 winrate.
So I don’t understand how you can say that matchmaker systematically forces losing games as it would say you’d get under 0.5 winrate. Or you might want to tell the “forced 50/50 winrate” would be a thing ?

There are also things to considerate as we treat that kind of data :

  1. The matchmaking system has been updated during season 3.
  2. The matchmaking system takes also queue time in consideration.
  3. We have no idea about the variation of your MMR after each game. Especially since you might have been playing when there were too few players around your MMR to get a balanced game.

Imo your winrate graph shows there was a problem during 50th game and 90th. The hill after 125th might be weird but maybe you did improve (or their update had an impact) ? Maybe you started playing a different hero ? But it shouldn’t have felt that bad.

Taht’s also a problem regarding the matchmaking design : should it be fair or should it “feel good” ?
I also feel like just by looking at outcomes without stats, its hard to know if the matchmaking is failing or not : there are tons of reasons to lose a match.
Even if we had stats, I feel it would still be hard to precisely know when matchmaking farts or not. Sometimes you can tell as you have ONE player getting much worse stats or much better stats than the rest of all players. Or when one team wipes like each minute.
But still, without having a look at MMR of each players, it’s still assumptions and no facts.
Devs told there were MMR problems and bugs… But it doesn’t mean that we can tell we spot those problems just by looking at our history.

2 Likes

Amazing effort. Thank you for this analysis

3 Likes

Update: I’ve included a short summary of this finding in a report on the Overwatch 2 Bug Reports forum. If you feel you’ve been impacted by this issue, you might consider signing on to that thread.

Now responding to the latest points:

This is very kind of you to say. If folks have found this post useful, then I am glad to have made it.

As I said, they do not disclose how exactly MMR is calculated. This omission is significant, though, as it muddies the question of what MMR reflects, or whether it reflects anything at all. However, we can be reasonably certain that it is the most important variable in assigning matches (as its name suggests), and so we can make indirect inferences about it by observing a large number of matches.

Which immediately contradicts the claim that the matchmaker uses only wins and losses in calculating who will match with who.

Do we have any particular evidence that the increase is multiplicative as opposed to e.g. additive?

This yet another variable in the calculation of MMR, other than wins and losses. We now have that MMR is based on:

  • Recent match outcomes (win/loss)
  • Intensity of engagement as a function of time
  • An unknown confidence term (“predictability value”)

To the point of MMR decay:

Here’s a quick plot of the same data, charting the number of matches played per calendar day:

https://postimg.cc/McBKY1x3

It’s worth pointing out that the data above is spread more or less evenly over the observation interval. In the 49-day observation period, there were only five day-resolution intervals wherein I played no matches at all, and none of these lasted more than three days. If my MMR did decay, then it did so very abruptly.

I assure you that people absolutely do say this, and if you spend any amount of time on this forum, you will definitely encounter them.

If I understand what you’re saying here, it would appear that public statements by the development team directly contradict you. In a January 30 post to the Developer Blog, Blizzard’s spokesperson says:

A player’s visible rank will move towards their rating over time as they continue to play during a season.

If people are seeing numerous out-of-place ranks in their matches, it suggests either that MMR has failed to move close to the visible rank, or that it has and that the matchmaker is simply making poor matches.

Thank you for taking the time to look more closely at the data. I have taken this fact into account; I noted it in the original post:

Please remember, though, that this is a global mean. As such, there are many different sequences of outcomes that have the same mean. For example, both of these sequences have a mean win rate of exactly 0.5:

WLWLWLWLWL
WWWWWLLLLL

However, they represent very different states of affairs. Yes, the data above has a global mean of 0.5, but the data also has a extremely high variance through time. It reaches the 0.5 value through a wild oscillation between very high highs and very low lows, and it just so happens these balance out by the time we reach the end of the table. This might be incidental, and it might not, but I don’t think there’s enough information in the analysis above to say with certainty.

I have considered the hypothesis that the matchmaker is pushing outcomes toward a global mean of 0.5 in all cases, but I don’t think I have enough information, at least by myself, to say how probable this might be. There’s reason to suspect it, yes, but it’s not self-evidently true either.

I’ve stated that the “matchmaker systematically forces players into losing games” because it demonstrably switches between modes of favorable and unfavorable matchmaking, and appears to do so on a remarkably regular cadence. It strains credulity to suggest that any of the other factors that could be determining match outcome alternate mode every 25 games, and continue to do so over the course of 200 matches.

To these points:

it’s remarkable that all of these factors might vary match outcome and the signal still shows a strong 25-match periodicity. That means that the influence of matching is so strong that even these numerous possible sources of noise are insufficient to drown out the signal.

I would also point out that I marked the dates of the two major patches (February 21 and March 07) on the original plot of the win rate. These are the dark-grey dashed vertical lines. They do not appear to coincide with any particular change.

My opinion? This is a game, and if it feels bad, then it’s a bad game. People will stop playing.

In this case, it’s not hard. Go back to the arguments in the original post: if the matchmaker were matching evenly, then MMR would converge, and match outcomes would resemble white noise, possibly colored by a week trend upward or downward as long-term player behavior changes. What has demonstrably happened in this case is that the matchmaker produced regularly repeated streaks of winning and losing, and did so over the course of two months of steady play. This was not a one-off or a black swan – this was a marked behavior of the system, which it exhibited consistently over a long period of time.

So, I apologize in advance if it seems like I’m being cross with you here. I sense you’re making a good-faith argument, and although I disagree, I do still respect and appreciate your engagement. I’ll try to avoid a harsh tone, but that can be difficult in text, and I admit that I’ve grown weary of certain arguments, enough so that they’ve started to irritate me. That’s not your fault, and so please understand my next remarks are directed more at a genre of argument than at you personally.

To the point:

Please step back and consider the claim at hand. We are literally looking at a spreadsheet of 200 dated match outcomes subjected to statistical analysis. If that is not factual enough, then what is? What is it going to take? 100 more matches? 1000 more matches? A dozen more players showing up with spreadsheets just like this one? I am literally sitting down and transmitting facts directly to you, the reader – if this is not factual enough, then what will be?

I have easily read dozens of posts in defense of the matchmaker, asserting that any claim against it is “all assumptions” and “no facts”. None of these posts seem to offer any meaningful explanation of their own, beyond “works fine for me” and “everyone else is whining or imagining things.” Their only “facts” are their own gut feels that things seem fine. Of course, they have endless ways to explain away every claim to the contrary – but they never seem to indicate that there is any evidence that could possibly sway their unshakable faith in the goodness and rationality of the Overwatch 2 matchmaker.

I’m sure there is some variation in player experiences of the current matching regime. It probably is working for some people out there. But the temperature of the forums shows that there is a predominantly negative experience of matching right now, and it strikes me as nothing short of magical thinking to dismiss all those reports as nothing but delusion and hysteria, especially when they are confirmed by empirical observation.

The facts at hand are abundant. The matchmaker produces streaks, and it does so with eerie regularity. According to the statements of the Overwatch developers, it is not supposed to behave that way. Quoting again from the same developer blog referenced above:

Sometimes, if a player goes on a very long win/loss streak, it’s indicative that their internal rating is not well-calibrated. The best way to calibrate your rank is to continue playing competitively. The more data we have, the closer you’ll get to a rank that best represents your skill.

I want to emphasize, again, that I have been playing Overwatch 2 on this very same account since the start of Season 1. In fact, I have been playing Overwatch since Season 3 of Overwatch 1. The matchmaker has abundant data on me. Exactly how long should it take to become “well-calibrated”?

The same post continues:

However, there are times when players are going to get lucky with their win streaks or the opposite with loss streaks.

echoing common talking point of MMR-apologists that any streaks are just the sort of outliers that inevitably appear now and then in any large sample of a random variable. But if these streaks are outliers, then how is it that they repeat with such uncanny regularity? Do you really mean to suggest that my “luck” abruptly changes every 25 games, and that it has been doing so consistently for the entirety of Season 3? Really? If “luck” is the only explanation that even developers can offer for these kinds of streaks, then we must look elsewhere for an explanation, because “luck” does not change on a fixed schedule.


I’m sorry to go off like that, but I’ve kept the text as written because I really think it needs to be said: The matchmaker is unequivocally broken, and we need to stop telling people otherwise. The evidence is absolutely there. If this is not enough, then what is?

4 Likes

Was the hero one tricked for all 202 games as a control, or were different heroes used?

Bottom line is, you can’t just play the game and trust it.

You have to dance around the matchmaker.

That means the matchmaker itself is too big of a factor. It should feel non-existent.

It doesn’t. It’s rigged.

2 Likes

From a podcast I know they already have these stats (how “balanced” matches actually are) but don’t include it in the rating of a player.

This type of let’s call the “difficulty pulse” I also noticed. The truth is that if the matches are not challenging for the player then the players will simply lose interest and at the same time if the challenge is too high for an extended period of time it will lead to “burnout”. My observation is that the algorithm will at some point substitute higher ranked players into your role on the opposing team. If you’ve made progress, you should be able to overcome this “difficulty pulse” hill. I don’t know what consequences it has further, but I think that then this system starts to fulfill a second role, namely your MMR increases.

My ability to concentrate is wobbly and when I regain my focus these hills of pulse I manage to win, so it’s not really a “loser queue” just simply matches of a higher level than those at the bottom of the pulse. Due to my health, I have gone through it many times and always managed to return to my place without any major problems. So personally I think the system does its job and I’m glad it works the way it does now because I used to be permanently lock at a low rank and as a result I stopped playing because I was just bored of the game. Soon I will start a new treatment that should stabilize me and improve my results, then I will write about my observations and maybe together we can solve the mystery. It’s good to know there’s someone with an analytical approach here