The forced 50% w/r does exist on paper in solo comp

Um, if you’re talking about the video above, that’s at GDC (Game Development Conference) and it wasn’t to students, but to other experts in Josh Mercer’s field, the same Josh Mercer that worked on the OW matchmaking system (among many others).

Which…I thought you knew, but just in case you’re not talking about a different video, just thought I’d let you know.

1 Like

Not sure if you’re purposely avoiding the question or not, I’ll rephrase it using your own comments so it should be impossible to not understand. You said:

“50/50 matchmaking still gives us the best opportunity to notice players are a different skill than their current ladder rating would suggest.”

Explain how the matchmaker would “notice players are a different skill than their current ladder rating would suggest.” Again, I know the answer already, I’m asking you to to see your answer.

Good job man, you got me. It wasn’t in front of students it was at the GDC, great job on your “gotcha” moment. This adds nothing useful to the conversation whatsoever, you’re just saying stuff to pick apart things because you’re desperate for some little win. Maybe later on I’ll do a typo and you can say “they’re not there” for another sweet gotcha moment. What a waste of time.

Dude. Calm down. IF you were making a mistake, it was a reasonable one, but also kinda important in the context of how seriously you took the video.

I wasn’t being a jerk here, it was a honest correction based on something you actually said, and I tried to be careful in stating that you likely knew it. And you COULD have been talking about a different video.

Do you always look for the worst in everything you hear from others?

1 Like

Okay. So.

We learn that a player is more or less skilled than we previously thought because they are able to change the outcome of our pre-match prediction. Our pre-match prediction is based upon our current understanding of the skills of all the players in the match. And our ladder ranking (in a competitive ladder system) is intended to reflect the actual skills of the players in the match.

If we want to know if any of the players in the match differ from our current understanding of their ability, we need to give them the best possibility of upsetting our prediction. (Because, again, our prediction is based upon our current assessment of their skill.)

When we make a 70/30 match rather than a 50/50 match, we are giving the players less of an opportunity to upset our prediction. So we can see if our prediction is really wrong (if the 30 team wins the match), but we cannot see if it is barely wrong. Which lessens our ability to learn about the players in the match.

In the 50/50 match, though, we maximize the chances that our prediction is wrong. (We essentially ensure that it must be a bit off, because one team will win.) This allows us to see finer gradations in skill among the players (rather than simply noticing when we are really wrong.)

Consider the following scenario:

  1. We make a match between 2 teams. I’ll use only a single number to represent the skill of the players in this match. Team A has the following 6 players: 105, 95, 99, 115, 120, 111 for a combined skill of 635. Team B has the following 6 players: 55, 85, 90, 73, 67, 60 for a combined skill of 430.

We would expect the first team to win that match the majority of the time. How much SR should we award them if they do, in fact, win? It should be based on a better understanding of their skill than we had before the match. (That’s the whole reason we make the match in a competitive ladder- to better rank the players according to their skill). But how much did we learn by making this lopsided match? That first team would, more than likely, still win if we replaced one of those players with a significantly worse player. We could take any of the players on that roster and replace them with a player who had a skill rating of 30 less and that team would still win.

So we do not know if those players are as skilled as we think they are.

That’s the key point. By making the lopsided match, we cannot tell if any one of those players is much worse than we think they are. Our rankings have a high uncertainty.

If, instead we make a 50/50 match:

  1. Team A has the same 6 players: 105, 95, 99, 115, 120, 111 for a combined skill of 635. But now, Team B has these new 6 players: 101, 99, 103, 111, 113, 118 for a combined skill of 635.

Now who wins? We honestly do not know. But it’s probably the team who has one or more players who are a little bit better than we thought they were. We can tell players who differ from our current understanding of their skill to a much finer degree. Now if we replace any of these players with someone a bit better or a bit worse, we impact the outcome of the match.

That’s why we learn more about the relative skill levels of the players when we push for 50/50 matchmaking.

1 Like

tale and I both have other posts that describe this. A while ago I made a post that linked them here so you don’t have to wade through a bunch of links.

And here’s another post that I wrote 4 years ago when I was answering the same questions you’re asking now.

2 Likes

My god. I just spent a good 30 mins scrolling back through some of those posts. You and others really did explain all of this back in 2018, didn’t you? There were some really good posts with really nice explanations of how these systems work. And there were several of you at the time using various language and analogies and ways of understanding this that worked together to give an excellent break down of how various competitive ranking systems do work, could work, have worked and how they each relate to the system that OW uses.

Good stuff.

And already present in the discourse way back in 2018. I feel like anyone who had an honest desire to understand probably could have at the time. That’s why it’s so mystifying that this whole “rigged matchmaker” and “forced 50% win rate” narrative persists.

1 Like

Back when I was still a Saint like you are now.

Come to the Dark Side, young Padawan.

2 Likes

I don’t know. I’m pretty sure I’ve been less patient in a post or two than I could have been over the past month or so. When you are teaching (at least when you are teaching adults who are paying you to teach them) people pretty much want to learn and understand that they have things that they do not yet know.

This sort of forum discourse is different and you never really know if someone is simply trolling, is honestly struggling to understand, or if they have some other weirder agenda at work. It makes the whole conversation harder.

3 Likes

If you follow my later conversation with Lhun you see the agenda change. At first, he’s legit asking questions, but he ends up going off the deep end.

You can both not understand and have a scapegoat, but not everyone who can understand can find a way to cope without the scapegoat.

That’s why you see me try to redirect to more plausible causes of their ire. You seem to do it simply because it’s true, but I really, really focus on the idea that yes, your teammates can be bad and that’s ok, here’s how the system works (and how it fails!) so you can stay calm about it.

Maybe you can figure out how to do it better than I, as making people feel better isn’t really my skillset.

ETA: I don’t know what I did with the original 9 page document, but before role queue, LFG, and the endorsement system I created a hybrid matchmaker and posted it on Reddit. I’m sorry I deleted it, but you can look at the comments and get the gist:

https://www.reddit.com/r/Overwatch/comments/7y87hy/jeff_and_the_overwatch_team_admit_they_have_a/

1 Like

Thats the point of matchmaking. If the game perceives you as good, you will meet better players and opposite

3 Likes

I feel like the thing is probably too entrenched in some people at this point. It’s become part of their identity, and that gets really tricky. There may have been a moment early on where things could have gone differently (but I think you guys did an excellent job at the time- I certainly don’t think I could have done any better.) But at this point people have put so much time and energy into convincing themselves and each other that there’s real investment.

Hell, there’s a culture that’s developed. And that’s hard to give up- even if there isn’t some sort of financial incentive involved.

It’s really interesting to study though. That’s probably, more than anything, why I still participate in these threads at this point.

I was really fascinated to learn, for instance, while perusing those 2018 threads that the theory has not been modified to address some of it’s fundamental incoherences at any point over the last 4 years.

There’s a whole ‘do we want random or do we want skill based matchmaking’ contradiction that was present in those early days, and is still present in the theory today. We seem to want random (because we talk about how the most skilled players should have really high win rates), but we also say we are going to make matches only between people of similar skill ratings (which would preclude those really high win rates we seem to want to see happen- unless skill rating is just a name and it has nothing to do with the actual skills of the players). It’s a pretty fundamental contradiction that’s just baked into the theory.

And that’s what I was trying to get at earlier with my unicorn believer analogy- people should be able to recognize, on some level, when this stuff is pointed out that they do not have really solid reasons for what they are saying, nor do they have a really solid theory that stands up to scrutiny.

If they still want to express a preference (even if it isn’t the most well thought through preference)… whatever- folks prefer what they prefer and think what they think.

It’s just that when they purport to be experts or have a really solid understanding of what’s going on, when it’s clear they don’t, it evinces much less self-awareness and it makes it much harder to have an honest and productive conversation.

1 Like

My favorite is when they complain about a forced 50%, but then say to create games based on SR, not realizing that the complaint and the solution are the same basic system.

As far as I know, I’m the only community member to try to develop a system to address the perceived problems. It became outdated with the introduction of LFG and Role Queue, but I posted the link in an edit to my last comment, if you’re curious.

1 Like

Bro…. I for one, am grateful for the amount of effort you’ve spent trying to clear things up…. But I’m not entirely sure you realize how zealous some of these people are.

P.S. got rolled by GMAT last year when I applied to grade school :sweat_smile: still got in though.

Some guy around here named Nano, not 100% aligned with the rigged cultists, tried to write a simulated SBMM in Python (I think) about a year ago.

It was okay. But not very complex. And cutbert and receipts tried to skew the presentation (as expected).

1 Like

No, normal distributions would occur from randomly selecting players. If you have an algorithm it will have a bias.

How do you know someone belongs in a certain rank if you modify the difficulty sufficiently to prevent them from winning more than 50% of their games?

1 Like

less chance of a flat tho!

2 Likes

So, I’ll go through how the GMAT does it again (just so we can see that this isn’t something that only OW does, it’s something that all sorts of competitive ranking systems do- the GMAT has no reason to try to rig the system against anyone and it’s accepted by all the best universities as a metric for ranking people according to their skill):

When a test taker sits down to take the GMAT, the algorithm selects a question of medium difficulty to start. (This is the equivalent of being given a gold match when you play your first comp game.) But every question after that first one is selected based on the test taker’s past performance. If the test taker gets that first question correct, they are given a harder question. If they miss it, they are given an easier question.

How does the GMAT know what difficulty of question to select?

It uses a whole host of criteria (that are tracked from the test taker’s prior performance in that Quantitative reasoning or Verbal reasoning section), but at it’s most basic level, the GMAT is trying to get the test taker to have an overall 50% accuracy- the test wants to attempt to get you to miss half of your questions.

And it’s pretty good at it. It has a huge question bank of questions at a wide variety of difficulty levels it can select. Roughly 80% of test takers will end up with an accuracy of approximately 50%. Only the top 10% of the ladder and the bottom 10% of the ladder will deviate significantly from that 50% win rate.

So, to your question- how does the test know someone belongs in a certain rank (or should have a certain score in this case) if it modifies the difficulty sufficiently in order to push test takers away from answering more than 50% of their questions correctly?

Two points:

The first is that any given test taker can answer more than 50% of their questions correctly. (I usually get between 80 and 90 percent of the questions correct when I take the GMAT, for instance- that’s enough to place me well within the top 1 percentile of test takers on that test.) And any given OW player can win more than 50% of their games. (Players at the very top of the ladder will do so. As will most players who are currently ranking up.)

The second point is that the GMAT is selecting questions in order to learn more about the test taker. (Just as the matchmaker is selecting matches in order to learn more about the players in that match.) And it learns something different if it is able to push a test taker to a 50% accuracy using only Easy questions than it does if it has to pull out the hardest questions in its question bank in order to push a test taker toward a 50% accuracy. And something else again if it is never able to push a given test taker toward a 50% accuracy, even using the easiest or the hardest questions in its question bank.

Consider the following scenario:

I ask you to rank an OW team. These are a bunch of amateurs. We first pit them against a team of low bronze players. Our team wins. So we pit them against a team of high bronze players. Our team wins. So we try them out against a low silver team, and they lose. We try another high bronze team and they win. So we try them out against another silver team. etc. Our team ends up hovering around 50% accuracy when we pit them against teams in the high bronze to low silver range.

We now run the same set of matches with a different amateur team. But this team wins all of those silver games as well. They do not hit a 50% win rate until mid Masters.

Could we not, then, say that the first team is somewhere about the bronze silver border, but the second team is somewhere in mid Masters- even though we manipulated the difficulty of their matches in order to push them toward a 50% win rate?

Just as the GMAT does.

1 Like

That doesn’t work for an individual who is solo-queuing. Every game they change teams. So your entire premise is moot. There is no way an algorithm that ranks teams can rank an individual within that team. Period.

Consider the confounding co-factors:

  1. Team composition and role flexibility.
  2. Skill breadth (can your teammates switch to a hard counter? will they?) Skill depth (how much experience do they have on any given role, and how many roles do they have that skill depth)?
  3. Variance. Some days you or your team mates might be in-the-zone, or they could be tired or not sufficiently warmed up.
  4. Sociability. More sociable people are more likely to be on comms and more likely to group up. Groups who are on comms are more likely to win. That is not an individual skill, but a team skill.
  5. Complexity. There are 31 roles with 4 abilities each, and many of these abilities have synergistic or compounding effects. To build an elo from the combinations of these would be next to impossible.
  6. False metrics. Have you ever won a game, and the enemy team got all the cards, PoG and mysteriously 150% of the eliminations? I’ve been in games where we won simply because we had a teleport from spawn to half way to the point, giving us a 6 second advantage in reinforcing our attack. The enemy team got more elims, more ultimates, did more damage, but we won effectively through strength of numbers.
    Another game we won because I spent the whole game as DVa shooting Sym and Torb turrets, and not much else. I got no medals, no card, no endorsements. It was QP, but if it was comp, I would have received a lower SR bump, simply because sometimes the things that win you games are not rewarded. And sometimes the doing the things that lose you games ARE rewarded. For example, doing a lot of damage, but not many eliminations builds the ultimate of the support roles. You can have gold damage, but lose the game for your team because Moira gets ultimate every 30 seconds.

The SR doesn’t measure skill or elo the way an actual game does. It measures something else, something for a new generation of gamers. A little bit like WWE.

This is a good critique of the overall problems with ranking individuals in a team based game. But this critique would be equally valid no matter how our matchmaker worked. It’s just a fundamental complexity that is baked into the system.

At the end of the day, OW will always be a team based game that nevertheless seeks to rank individual competitors with its competitive mode.

None of your critiques apply any more to a matchmaker seeking to push players toward a 50% win rate than they would to a matchmaker that makes matches more randomly. The only critique that you are making in this post that applies more to the current matchmaker than something like the most commonly proposed matchmaker (which does not use hidden MMR or performance based metrics) is your point 6. But even then it is a critique of using performance based metrics rather than a critique of seeking to make 50/50 matches.

The question you posed was originally- how does a system such as the one OW uses (or the one the GMAT uses) know where to place people when it pushes them toward a 50% win rate? That was the question I was answering.

Do you get how that works now?

2 Likes

As a reminder, PBSR used to be much, much stronger, so we know how tuned down it currently is. Most normal players aren’t getting much. You have to take stomp to see the effects you used to see, and as far as I can tell, do so on a regular basis. As in, one really good game isn’t affecting your SR.

I will caveat that I only think this from seeing other parts about SR gains from people that really are playing outside of their rank. It’s a small sample size and has the same anecdote issues as most accounts.

But, I do know PBSR isn’t nearly as strong as it used to be.

2 Likes