Algorithmic Handicapping (MMR) is Wrong for Overwatch

2 out of 10 on the punnery scale. have you tried googling better ones?

3 Likes

Imagine there’s an X-Men character whose birthname is Juan, and whose mutant power is the ability to duplicate himself. His name for the X-Men squad? “More than Juan.”

I just want to mention, there is a new video posted to my YouTube channel. It relates to Overwatch, anyone please view and let me know if you want to discuss!

2 Likes

This is not the definition of fair. Fair is random everyone has the same random chance to get the trash genji 1 trick and everyone has the same chance to get the superstar hard flex hits can.
Handicapping (which is forced 50 50) is not fair

3 Likes

A random matchmaker would be much less able to accurately determine player skill, would have lower match quality, and would result in far, far more complaints. There is a reason (actually multiple reasons) that well designed competitive ranking systems try to create matches that have a roughly 50% chance for either team to win.

Overwatch is not the only system that does this. Computer adaptive testing also does this. Multiple experts designing these systems have all recognized the benefits of pushing people toward a predicted 50% win rate. I have written about this extensively before and can do so again if it would be helpful.

2 Likes

The only benefit is they get more money.
Do pro sports teams say… you know let’s give our best players to another team

2 Likes

No. But if you were an outside observer and wanted to determine if any given player was better than some other player, would you rather put both players on comparable teams and see how those individual players play, or would you put one player on a team of other professionals and the other on a team of elementary school players?

Random matchmaking says the latter is a good way to determine which player is better. Matchmakers that push toward a predicted 50% win rate say the former is the better way to determine which player is better.

The matchmaker does not care who wins. It only wants to determine who should rank up, which is to say which player should be ranked higher than the others on the ladder. Your analogy is focused on benefiting some players over others (my team should get the best players) rather than most accurately determining which players are more skilled than others.

2 Likes

But the matchmaker does care who wins. MMR is designed to ensure that everyone has a 40%-60% chance of winning every match. And in every match it puts the most skilled players on opposite teams. In so doing, the Matchmaker sacrifices the advantage of those players’ skill. Being measurably better than others in your SR bracket automatically flags you for handicapping. By the same token, the Matchmaker benefits players who are relatively unskilled.

Again, the Matchmaker does care who wins. It cares that skilled players don’t do too much winning, and unskilled players don’t do too much losing. And it forces those things not to happen by handicapping players via match placement and team assignment.

That’s not what I said! Statements from Overwatch’s Principal Designer and Lead Developer, as well as patent filings from Overwatch’s publisher, PROVE that Overwatch uses algorithmic handicapping in ranked Competitive Play. Further, those statements and patent filings are EVIDENCE of the problem I’m suggesting. Whether that evidence constitutes proof of my argument’s truth is for the reading public to decide.

You’re calling me a liar, which is an insult I take seriously. But you aren’t backing it up with anything?

Cite one example. Tell me a single inconsistency I’ve uttered which is factually untrue. You can’t. I demand your apology.

2 Likes

This is a fundamental misunderstanding of how the system operates and even what the goal of the system is. If I prefer to watch college players play against college players in order to better understand which players at that broad skill level are better than the others do I care which players among those college players win?

No, I just recognize that matching college players against elementary school players will not allow me to meaningfully determine which college players are better.

Likewise, when a computer adaptive test gives a test taker a question that it expects that test taker will have a roughly 50% chance of getting correct, the test is not doing so because it wants to hold more skilled test takers back. It is doing so because it is designed with a sufficient understanding of competitive ranking systems to recognize that setting up an expected 50% win rate is the best way to assess the performance of unknown test takers.

This does not, in any way (and I cannot stress this enough) sacrifice anyone’s skill. In fact, it allows skill to meaningfully express itself. Stomping over competitors of wildly divergent skill levels is not an expression of skill. It is, in fact, running away from an honest expression of skill. No one cares if Lebron James beats a middle schooler. Lebron James needs to play against the best players in order to make his case that he is the best.

That’s how this always works. The idea that stacking the best players on one team and having them play against people who are much less skilled allows those best players to demonstrate and hone their skills is both nonsensical and timid. It’s an attempt to avoid a true contest of skill and to obfuscate a true ranking of skill.

No well designed competitive ranking system operates that way- it’s not handicapping to have to play against equal opposition.

4 Likes

Look you’re not wrong. But your computer adaptive skill test is a solo adventure. It is not a team based game.

That’s the problem that a lot of us see. Using metaphors, each rank should be equivalent to a level of competition.

Bronze- should be little league. This is where you learn the fundamentals, strategy, how to win.
Silver - Junior Varsity - More strategy, why teamwork is better, refine fundamentals into beginners skill
Gold - Varsity - Skills can range from beginner to novice, understand the full fundamentals, even if you can’t pull them off.
Platnum - College level athletes. Some superstars… but mostly slightly above average
Diamond - Top college athletes/Minor leaguers
Master - Minor League-Low producing major leagues
GrandMaster- Major Leagues
top 500- Super stars.

At least that’s the way see it.

Your adaptive test example is you. and only you. not you and a team of 5. This works well for solo sports, and maybe team sports that ALWAYS have the same teammates (that could be seen as one entity) but solo duo or triple cue… it doesn’t work right IMO.

SR is individual or it isn’t.
Decide

My point is that all of the supposedly harmful things blizzard is using to rank folks (separate SR and MMR, performance based SR gain/loss acceleration, pushing people toward a 50% win rate, etc) are also used in every other competitive ranking system.

Blizzard uses those systems for the same reason that the GMAT does- in order to create the best competitive ranking system they can. Yet, this thread (and the duplicate threads) wants to suggest that there is some singular aspect to OW’s matchmaker.

What is it?

People have said the 50% thing. But that cannot be it because it is used in other well-designed competitive ranking systems. I’ve seen people point to the separate SR and MMR, yet other well-designed competitive ranking systems use that as well. Still others will suggest that performance based accelerated SR gains and losses is the culprit. Nope it’s used in, you guessed it, other well-designed competitive ranking systems.

Why would the GMAT use these systems if they hurt more skilled test takers? Why would top tier colleges accept that? The GMAT is literally designed to rank all test takers at whatever skill level and to find finely graded distinctions among them so that it can assess someone who is in the top 1% of test takers vs. someone in the top tenth of a percentile or the top hundredth of a percentile.

In order to do that, the test uses the same systems that OW uses.

None of this is to say that there aren’t problems but most of those problems are with player behavior. (And, across a large sample size, player behavior impacts each of us roughly the same.) In terms of issues with the matchmaker currently, I would suggest the largest issue is that it compromises it’s matchmaking in order to reduce queue times. But that, too, is largely a player issue- if the community weren’t loudly decrying lengthening queue times there would have been less pressure to prioritize queue times over more accurate matchmaking.

In terms of team queueing, I think one could make a meaningful argument that OW should only rank teams against other teams, having a separate solo queue and team queue. That might make some sense. It would also exacerbate some of the issues that we are currently experiencing (in terms of queue times and fracturing the player base.) That’s honestly a personal preference thing though and very unlike the claims made in the OP.

1 Like

You have random element still, game will randomly pick up members of teams to have fair chance to win.

1 Like

No.
Not only are you factually wrong about the stats/testing, but you’re missing several critical factors that make random-around-rank matchmaking a requirement for legitimate esports.

Lesson1: Statistical Reasons:

Random is a more robust baseline. In this case it’s the best (optimal power/efficiency for all noise thresholds).

Win against uniformly sampled backdrop of the rank and convert, or don’t. It’s a robust datum because everyone has the same noise level so individual signals is easy to separate.

Note the datum still “adapts” in the sense that it moves with the average rank of the lobby. This is extremely powerful without the costs/risks of false adapts from your GMAT analogy.

You can devise more elaborate tests (schemes at that point) which pad with placebos and error correction, etc. Those will get higher performance in a targetted region of your statistical tests. But, they shatter and breakdown in a resampling no-reset (net entropy gain) evironment. This ladder design is terrible for “adaptive codes” because it has bad actors and no-resets. Your “adaptive” analogies lose because those can be spoofed/evaded for less VC cost, Vapnik–Chervonenkis shattering).

Is “adaptive” the undergrad word for “rigging?”

Meaningful expression of skill” is hedged away with algorithmic handicapping. Forcing 50-50 maximally buffers that distinctiveness away. It’s makes distinguishing oneself through conversion agency over the lobby that much harder.

You’re burying everyone’s signal into a rigged lobby. This is OK for lowpop/sparsity conditions. Good for placements, when you want to rapidly sieve relative orderings but don’t care about the distance apart. You get more upfront information gain and rank-reversal surprisal. But you pay for it over highpop/dense playouts. Overall it’s less expressive because, you literally rigged it all away!

Lesson 2: Metricisability

For skill expression to be meaningful (in a quantitative sense) the metric needs to meaningfully link payoff/reward feedback with proper label. For SR this means directly mapping % players by skill.

With random sampling is the SR becomes THE measure of skill. There won’t be “divergent skill levels” for the same SR, once the lobby formations aren’t rigged and the ladder settles people into their proper ranks.

Lesson 3: Psychology

Psychologically, the competitive drive in people won’t minmax, if the labels are fake and lobbies are rigged i mean ‘adapted’ via mmr. Delayed or “lied to” performance feedback (hosted at the wrong rank, kept down by mmr 50-50) is OK up to a point (initial engagement), then it’s just terrible for sports psychology (Journal of Sports Medicine, Fletcher, Gilles et Al. Measuring Well-Being in Sport Performers: Where are We Now and How do we Progress?)

Lesson 4: Environment

Delayed rewards shouldn’t be taxed with adaptive tests in a noisy environment. The dynamics of this ladder system are both stochastic as well as adversarial.

You can extract all the meaning and expression you need just as efficiently and with lower misclassification error rates by letting players rise above randomness for their rank (some narrow-band interval). This band moves with them as they rise and in fact becomes less random (higher rank higher variance reduction). So the random datum literally adapts and filters for you.

Lesson 5: Privacy, Ethics, Sports, and Competition Reasons:

Sports and esports are about keeping things private, stateless, and intangible. You 1-off perform, convert, or don’t. Anything tracked isn’t used to shuffle rig or ‘adapt’ teams between each round or match. That would be preposterous.

The SBMM and EOMM systems go full privacy invasive. DDA is a whole other level of rigging we won’t talk about here. But for these “tests” to work, they have to assume 1. no1 spoofs, and then 2. slurp your data, crunch it and (maximally) rig the outcome (50-50 is the most reduced agency/affectance).

Over time they have accumulated so much data and predictive power they can basically Watson to within decimal places several parameters of the match (not just outcomes like expect/vary but margin of victory and basically any of the other features they track).

And yet they fail (VC shatter type failure) left and right because of over-fitting. Anchors stats, no-resets, alts.

You don’t want live-adaptive tests in sports.

That takes away the entire point of competition, which relies on varying and unquantified uncertainty. The point is to not design away the agency and expression over a sample of your rank. That’s literally how you show your rank, meaningful and earned, and your ability to compete for more rank. When it’s rigged, you’re competing under a different set of controls that squeeze away everything natural. That’s privacy invasive, discriminatory, unfair and unethical. Soon, it will be illegal.

Lesson 6: Fun is subjective and moot

Are rigged matches more fun because they’re close? I haven’t read up on that psychology. I would argue sharp labels are “fun”. That non-discrim non-invasive non-adapt rankings are “fun”. I don’t find it fun to have my lobbies tampered with. I don’t find fake accnts fun and I absolutely find it fun to stomp or be stomped knowing it’s just random competition process as I ladder and progress.

1 Like

So you do actually believe that we are better able to judge Lebron James’ skill if he is randomly assigned opponents that may be far below his skill level than we are if he is placed against opponents as close to his skill level as possible?

You do think that we can meaningfully judge Lebron James’ performance by mostly watching him play against amateurs? And that Lebron James’ skill is best expressed and demonstrated and honed if he only rarely plays against other professionals of his caliber?

That is your honest belief?

Further, you think having Lebron James play against other professionals (rather than someone like me who is a random 5 ft 9 in amateur) is rigging the matches? That unless Lebron gets to utterly dominate random opponents who are far, far less skilled than he is he should throw his hands up in the air and decry this rigged system that holds him back from truly demonstrating his skill?

Like, I get that people have a range of preferences and that people might (for whatever reason) prefer random matchmaking. I just wish we could have an honest conversation about the results of random matchmaking- lower quality matches, less accurate skill ratings, and lower overall quality of play.

2 Likes

I didn’t read a word of that. Random is the best way to play this or with only 6s

1 Like

I watched your video, and it’s quite interesting how you talk about how you are being handicapped by the mmr system because in that same video, you include gameplay that is completely contrary to what you are saying. You are complaining about how the system is holding you down and is blocking you and other players from ranking up, but after seeing the gameplay that you provided it is clear that you truly have no understanding of the game and how you truly do no deserve to be higher rank. After many posts of you claiming that the system is rigged and many people are “victims” of “discrimination” it is now clear to me and others as well that every complaint you make about the “system” keeping you down has no merit.

I am glad that you may find enjoyment in writing these articles, but I hope that you eventually either realize that the blame doesn’t lie within the system, but within your own shortcomings. I do not say this to be rude or to be aggressive, I say this as a straightforward direct message to tell you the truth of your situation. I hope that once you quit these forums, you enjoy life elsewhere.

2 Likes

Of course not. Robust testing means not adapting the test between output sessions, and using everyone at the said rank to baseline the skill delta. Random from Lebron’s rank means with/vs. other NBA players. Would a Watson-backed cherry-picked all-star game full of his counters reveal his skill better? In a one-off sense it could (for Watson, not humans). But it would also throw out of wack the skill calibration of all kinds of other talent. It’s not a useful metric at large. You test him vs. NBA games. Swapping teams around between quarters with some kind of non-random selection bias (in this case, the worst kind - maximal handicapping) is just ludicrous.

Ofc not. No1 is suggesting this why even waste the time asking. Did all the statistical lessons go over your head?

We can’t have an honest conversation when people confound what “random from your rank” even means. We’re talking about non-adaptive matches, i.e. no mmr, just using SR to ship and rank. It’s the best measure of skill you get derive. More accurate, more fair, and the quality of play goes up at the ladder level. No point in “forced” matches if it makes the skill metrics fake.

Correct. “Random around your rank” is how you should be tested and judged for a population-based competition. As you prove yourself and go up in rank that “random around your rank” has obviously changed to be a new higher region of sampling. MMR is a tailored, rigged experience that fails for many reasons some of which I listed above.

3 Likes

The system is messed up

Okay. So you do want matches made by skill rank then? You do not want randomness. Random from your rank is how the matchmaker works now. (I mean, it’s not entirely random- opponents are selected based on comparable skill and low ping etc.)

Random is just another way of saying I want the matchmaker to work as it works now based on what you are saying here.

Others were earlier suggesting that they did not want matches made based on skill rating. They truly wanted randomness instead. But it sounds like you and I are in agreement that random matchmaking would be a much worse system than the one we have now which matches based on skill.

And the only reason there is a difference between SR and MMR is so they can penalize leavers and such. If there were only one rating, then people who left matches would end up against easier competition. SR and MMR are functionally identical for the majority of the player base. MMR is the true skill rating and is hidden so that people care about the penalties they get to their SR, which is the publicly visible value. Basing the matchmaker on SR rather than MMR would result in worse matchmaking (though probably only marginally so for most players).

2 Likes

I have updated the original post with important information. I hope readers, and especially critics, will have another look.

1 Like