Why Match Quality is Frequently Poor

RexOverwatch-1555 · January 2, 2019, 11:24pm

Mongolian wrestling has only one weight class. American boxing has many weight classes. Neither is right or wrong, just a different way to do things.

Kaawumba-1133 · January 2, 2019, 11:33pm

People report all sorts of things, often contradictory things with insufficient proof.

If someone bounces of 3200 a bunch of times, it is much less memorable than if they bounce off 3500 a bunch of times.

For everyone who went on a loss streak, someone else was winning. Those people who were winning, they don’t come to the forums to complain.

If you want to show rigging, you need a lot more than, “some guy on the forum claimed that things are rigged.” SR over time records would be a good place to start, followed by a statistical significance analysis.

Kaawumba-1133 · January 2, 2019, 11:38pm

I’m aware. If outsourced to the community, the developers certainly need to manage the volunteers to make sure that they are following policy.

Jeff has mentioned hardware banning people, but I get the impression it is not done often enough.

Kaawumba-1133 · January 2, 2019, 11:50pm

I think you may not be getting the point (or we may not be communicating in some other way).

In team vs team mode, individual players do not have an SR. Only the team has an SR. This will make the performance of the team much more consistent, because team membership does not change. The normal queue issues like different people in a group gaining different amounts of SR won’t happen because individuals don’t have SR.

There needs to be some appropriate reward to make people want to stay together as a team. The only things I’ve thought of so far are an improved gameplay experience, lootboxes, unique portraits, and a leaderboard. But these don’t feel sufficient, so I’m still thinking about it.

Getting gamers to work together is hard. World of Warcraft manages to do this, mostly by locking gear and content behind the “You must work together” barrier. Even then, though, being a successful guild requires significant management skills and effort on the part of the guild leadership.

Kaawumba-1133 · January 2, 2019, 11:53pm

You bring up a good point. Things can always be worse. I’ve played games where I just get flattened (due to not being good at the game) for my first ten matches. After that, I just stop playing.

S23-11611 · January 3, 2019, 1:01am

Kaawumba:

S23:

Kaawumba:

S23:

My metrics on Reinhardt were not very good so I am PUNISHED by the matchmaker algorithm believing that I belong at a lower rank so I am placed in order to lose.

This is just false. The system does not rig games to push you to a specific rank.

Then please give some sort of explanation for why people repeatedly report going on loss streaks exactly at the threshold of a new tier. Did the matchmaker think they were getting boosted? Are there gateway matches? To say that “nothing different happens at tier thresholds” translates to “there is something going on that I am unaware of” because it does exist and is a source of frustration and confusion by who knows how many people playing this game.

People report all sorts of things, often contradictory things with insufficient proof.

If someone bounces of 3200 a bunch of times, it is much less memorable than if they bounce off 3500 a bunch of times.

For everyone who went on a loss streak, someone else was winning. Those people who were winning, they don’t come to the forums to complain.

If you want to show rigging, you need a lot more than, “some guy on the forum claimed that things are rigged.” SR over time records would be a good place to start, followed by a statistical significance analysis.

You know as well as I do (or you should) that with this type of data it is almost impossible to show any sort of “problem” with the matchmaker. Think of it like this, let’s pretend that I literally paid a boxer in real life to throw a match and purposely lose. The boxer he was going against was someone who will put up a good fight, it would have been an even match but having him throw the match guarantees the loss.

My fighter goes a copule of rounds then let’s a punch hit him and he falls to the ground and doesn’t get back up. People accuse him of throwing the match and I say “but can you show me data and statistics to prove the match was rigged?”. Of course they can’t because it doesn’t work like that, but because they can’t provide data they seem to be discredited.

That’s what’s going on here, you know the data that we have access to will not show discrepancies with these rigged overwatch matches and so you keep asking for it as a way to discredit people, but you don’t always need data to know something fishy is going on.

Despite the difficulties in proving the rigged matchmaker I still have provided evidence. I posted a video of a grandmaster player who was smurfing down in SILVER TIER who was playing in multiple matches that he almost lost. (literally came down to 99% last map type of thing). This is evidence that something is wrong with the matchmaker, I watched him destroying people, he was probably carrying the weight of at least two people and still it was an even match, I know you saw the post I’m talking about, why don’t you admit that there is something strange about that?

Puck269-1489 · January 3, 2019, 10:06am

Except you forget one thing, maybe group SR doesn’t change, but the individual skill, or the ability to win the game will still be varied by 6 individuals. You still have that mulitplier of variance that you have to take into consideration.

Sure, you can say that one unit will have the same varience as one person, but we all know that isn’t the case. If 6 people all have an off day, it will be an even greater loss or variable of performance of the team than just 1 person of a 6 man team having an off performance. Same thing happens if everyone is having a really good day.

The result of the game in both a clan and an individual with the 6 teammates is the same, or as in the probability that one individuals variance of performance has on the effect of the game. You don’t just have 1 individual on a team losing, but the other 5 on the same team winning, they win and lose as a group which will effect everyone’s SR change because you didn’t model for pbsr.

This variance of performance will get even greater as you swap in and out individuals within the static team.

For example, right now you modeled safely that a person’s individual performance is +/- 150 sr. But when the model has 6 people all on the -150 sr, your group performance is not just -150 sr… it is much worse then a team that has only 3 people performing at -150 sr… they would only be performing at a relative -75 sr as a team. Or an individual being -150 will only bring their team down -25sr in relative skill in determining if their individual performance will result in everyone winning or losing. A team that is say, performing at -100 sr wouldn’t beat them. The individual varience is on a flat number, the probability of being -150 is the same as being 0… whereas the team probability of being -150 to +150 is on a bell curve.

But if you then take a team of individuals and put a flat and equal variance in their team performance, you ignore the probability and the impact of individual performance within that team.

Right now the numbers and data you have is based on the idea that skill varience is just 150… because that is as much impact one individual can have on the team… because say, it really is only a 25sr actual swing.

Does a team that is all performing at -150… can they be beaten only by a team performing at -150? Or can they actually be beaten by a team -1000sr below them? Or maybe it is -500 we don’t know. We don’t have that data to presume that the scale of skill is correct on a team level.

We don’t know, because right now the individual is buffered by their teammates and the SR scale is built around that, you don’t have teams of say silvers, facing a team of golds with a 500sr gap between them and seeing the probability of the silvers beating the golds. All we have is what happens if the team variances are 25sr and an all or nothing win/loss. The team could just be 25sr off cause 1 person is performing at 150sr off, or if the whole team is -150 and could be beaten by a team -1000 sr below them.

Considering individual SR has a range of 1000… and can have an impact on the team as much as 500 sr and still be considered fair, we have to consider that a team of silvers can beat a team of golds 500-1000 sr away.

Can your model survive that large of a swing in performance?

Wednessday-1923 · January 3, 2019, 12:03pm

A friend of mine, who I won’t mind saying is an idiot, got HWID banned by Blizzard because he legitimately cheated. That said, that’s the only thing I know that Blizzard does go out of their way to do a HWID ban.

Kaawumba-1133 · January 3, 2019, 7:14pm

First, let’s look at a simulated match history that looks perfectly, natural, according to one forum user … S23.

I ran this through the analysis described at Overwatch Forums and found that this data definitely gets flagged as rigged. Specifically, there are way too many streaks of length 1.

Win/Loss Simulation and Data - Google Sheets shows real and realistic simulated data.

Your simulated data looks like this:

More than 60% of the streaks are of length 1, while it should be 20-30%.

This excessive number of short streaks shows up here as a excessively high tendency to lose after you’ve won, and vice versa.

So apparently it is much easier to detect fakes that you realize.

Second, you can’t both claim that it is nearly impossible to disprove rigging, and then say that you have proof of rigging.

Third, this GM who had some rough games in silver. He still won, didn’t he? If he actually tries hard (and he is actually a GM), he should get back to GM. Also, are you talking about this guy: Bronze to GM(Educational)? I’m really not convinced he is actually a GM, especially on the heroes that he was playing in the low tier matches. By the time he got to plat, he started saying he is a Mercy main: - YouTube. And there the series ends, nowhere near GM. Other bronze → GM series I’ve seen (that actually finish) players don’t usually lose a game till diamond or so.

Kaawumba-1133 · January 3, 2019, 7:30pm

When taking an average, variance is not multiplicative. It actually shrinks. More specifically, for players:
mu_a +/- sigma
mu_b +/- sigma
mu_c +/- sigma
mu_d +/- sigma
mu_e +/- sigma
mu_f +/- sigma

average = (mu_a + mu_b + mu_c + mu_d + mu_e + mu_f)/6 +/- sigma / sqrt(6)

This is derived with standard error propagation. Propagation of uncertainty - Wikipedia

Intuition can explain this by saying that when one player has a good day, it is likely that some other player will have a bad day. It is rare that all players will have a bad day simultaneously.

The model explicitly includes large (trolling) swings in performance in the last simulation.

Kaawumba-1133 · January 3, 2019, 7:31pm

Good to know. Thanks.

S23-11611 · January 3, 2019, 8:21pm

Where did you get the 20-30%? What is that based on?

You are drawing your conclusions assuming that the data above is the entire data set, when actually this is only a piece of the data set. I purposely gave a part of the data that should be flagged as abnormal, but if appended to a larger data set (most people play far more matches) it would not be so obvious.

That’s like me saying “hey I played two games today and won both of them” and you saying there is something rigged because I won 100% of the matches. Sure, I won 100% of the matches but only in the sub section of data, if you look at my past 400 matches my winrate would not be 100%. Ok so I need to append more data before the above dataset for it to be complete, but you know what I’m talking about you are just relying on the ignorance of most people on the forum in order to support your case. There is almost no large data set that could be analyzed and viewed as “statistically abnormal” and you know that.

Puck269-1489 · January 3, 2019, 9:51pm

Yes, I explained this in my last post. This is done through your model (or it should) where the ACTUAL varience in the game of 6 people is around 24 if you put an individual varience of 150 in there.

This model of 150sr however was an observed varience based on typical SR swings of individuals I’m assuming.

However, a full team varience SR swing was not modeled, even at +/- 150. And even then if you use the 150 baseline for 1 person as your model, would not be accurate for teams. Because the SR scaling is based on matching teams basically within 25sr of each other on their MU (not necessarily their SR, but we’ll assume that SR is equal to MMR and the Mu is the unverified “true skill” of the player.

Anyway, again, we don’t have any data except the system’s max acceptable SR range (500) for having a grouping still be considered “fair” this means someone can consistantly throw off a team’s avg by 650 sr and still be considered a fair match. (500 (spread over 5 people, + 150 internal varience). This can happen both on the low end and the high end making it possible that a team SR’s that are displayed a good 1000sr off is possible to win with a +/- of 650.

Now granted, like before, the bell curve would reduce the 150 varience moreso than the individual, however the 500 grouping SR can really throw things off when talking about the group’s individual performance based on the SR the group has at the time.

So, you would have to be able to model the possibility that a team even with an off day and the other team having a good day. You have modeled a swing of 150, but not a good 650. Thats not even including the possibility in your model that a good up to 3 people could drastically increase the performance of the team well above where the team is ranked… and simply putting the very small posibility that you put down for “trolls” could not compensate for that… as it could happen very frequently.

Kaawumba-1133 · January 4, 2019, 6:05am

The graphs I linked: Overwatch Forums
Second row, third column, first point: about 0.24 +/- .03 streaks of length one per game. This is simulated data, which agrees with real data from Porkypine (3rd row) and Des (4th row).

Your data has 0.61 +/- 0.10 streaks of length one per game.

The difference is 0.37 +/- 0.104, or 3.56 sigma, which corresponds to a 0.037% (37 parts in 100,000) chance that your data is consistent with random chance (that is, un-rigged).

Your fake data is so bad that even though there is not a lot of data, it was obvious that it was fake. Significance analysis like I’ve done allows us to determine how confident we are that a result is valid, given a specific data set. Two games wouldn’t be enough, but you gave me more than two games.

No you did not. You even bolded would not be caught. I’ll quote your entire post this time:

How Competitive Skill Rating Works (Season 11)

There is no model or program that would be able to catch the “rigged” matches in Overwatch. In order for a pattern to be flagged as something strange it would need to be completely out of the ordinary, here’s an example:

wwwwwwwwwwwwwwlllllllllllllllllllllllllllllllllllllllllllllllllllwwwwwwwwwwwwwwwwwwwwlllllllllllllllllllllllllllllll

and even with the above data it might not catch it as “out of the ordinary”. On the other hand, the artificial win/loss data below would not be caught as fake:

wwwlwlwllllwllwllwlwlwlwlwlwlwlllllwllwlllwlwlwlwllwlwlwl

even though I just played the drums on the “w” and the “l” key randomly. Someone presenting data that they ran through a program and saying “everything checks out” means almost nothing. Your best bet for understanding how the matchmaker works is to observe it with an open mind and look for patterns that are obvious. When they stand out then obvious is obvious.

Kaawumba-1133 · January 4, 2019, 3:41pm

Are we still talking about team queue? Or are we back to solo queue?

Your arguments are very murky, and the math doesn’t quite add up. If you’d like to discuss or object to my model, it’s probably clearer to actually discuss the model.

Win probability is modeled this way:

def true_win_probability(team1, team2):
  delta_mu = sum(r.skill for r in team1) - sum(r.skill for r in team2)
  sum_sigma = sum(r.inconsistency ** 2 for r in itertools.chain(team1, team2))
  size = len(team1) + len(team2) 
  denom = math.sqrt(size * (env.beta * env.beta) + sum_sigma)
  return env.cdf(delta_mu / denom) 

# in the main loop, for each game:
roll = random.uniform(0,1)
if(roll <= true_win_probability(a_players, b_players)):
  team a wins 
else:
  team b wins

In solo queue, there are six players in each team. In “team queue” I consider there to be only one “player” in each team, that represents the efforts of the entire team (whose membership does not change).

S23-11611 · January 4, 2019, 3:45pm

The numbers you came up with are based on the data you simulated with and porkypines data, I could play for hours and pull out a specific segment of that data which, when run again, would bring the chance of my bogus data being much closer to being “not rigged”.

But let’s just pretend that you are 100% correct, that there is no data that I could allow you to train/simulate your data with that would allow it to flag my bogus data as legit…even then the point I am making is that the matchmaker would have to be doing something very blatant in order to be identified as “rigged” through data analysis.

In other words, the bottom line is that it would not only be easy to conceal a rigged matchmaker, but most likely the matchmaker was designed to simulate a legit matchmaker so data analysis would not reveal it. I could create a bogus matchmaker that was coded to stay within the .37 +/- 0.104 in order to not be caught.

This is what I’m saying, you know that it can’t be revealed through analysis because it was made that way, that’s why you want to keep bringing up statistics.

Having said all that, I would like to hear your response to this frequent scenario I used to find myself in, you seem to avoid answering questions around these: I have a new account and am using my best heros and ranking up. The team needs a briggeta to stop a doomfist, I am not great with brig but can at least stop the DF with her so I just concentrate on stopping their main player, doomfist. I do it with success and we win but I did not heal well with her or do very well with other stats. I am placed on what feels like a big loss streak - I have witnessed this so much (and with multiple players) that I dare not even change to hero’s that I don’t play frequently even if changing to that hero will probably cause my team to win. Are you to honestly say you have no idea what I’m talking about in the above explanation?

Quarters-11687 · January 4, 2019, 6:54pm

S23:

Having said all that, I would like to hear your response to this frequent scenario I used to find myself in, you seem to avoid answering questions around these: I have a new account and am using my best heros and ranking up. The team needs a briggeta to stop a doomfist, I am not great with brig but can at least stop the DF with her so I just concentrate on stopping their main player, doomfist. I do it with success and we win but I did not heal well with her or do very well with other stats. I am placed on what feels like a big loss streak - I have witnessed this so much (and with multiple players) that I dare not even change to hero’s that I don’t play frequently even if changing to that hero will probably cause my team to win. Are you to honestly say you have no idea what I’m talking about in the above explanation?

I don’t know why you keep bringing that kind of scenario up where you get forced onto losing streaks if you don’t one trick. That kind of posting history from you caused other players that actually read these forums to try the same kind of stuff, and those other players already had bad attitudes and intentions to begin with, so you’re simply getting negative feedback loop.

Puck269-1489 · January 4, 2019, 7:51pm

I’m really bad with models… and am relying on you to tell me the processes that I’m describing are accurately reflected in the equations.

In your team of 6 model, does it give wins and loses to all 6, then recalculate?

Does it give player variance to all 6 players, or only the player we are tracking?

You can’t treat 1 team of 6 with 6 different variances… the same as you would 1 player with a player varience.

Furthermore, the only way you can treat 1 team as a singular object on part with a solo person team… is if the individual parts never change.

An example of what would happen if you changed a teammate within a team… would be if an individual all of a sudden changed the way they played the game… like switching to a controller, or played with their feet, it can and WILL be that drastic of a change.

Sure, some teams could have everyone pretty close together, then it would be a small varience, like playing while tired. But that I think will be a rarity, then a norm.

The simple fact is you do have to account for a drastic change in team skill when a player is swapped. That is NOT calculated in your simulations because your “troll or smurf factor” (which is very small btw for the purpose of teams) isn’t introduced until after the 6-team variances.

For example: Your 9 man truskill is composed on 6 2000 players and 3 3000 players. (just to make it simple). They do placements and place at 2400 like all teams. Then, they play 3 games with their 6 2000 players… losing all of them. So say they drop to 2300. Then then swap in their 3000 player so they have an effective team skill of 2500 and then win all their games… and maybe even exceed their normal rate and get to 2600.

Then swap back their 2000 players to play in 2600 games… the game quality then turns absolutely horrible and they go on a massive loss streak down to 2100… maybe swapping in 1 or two of their other guys to get a win or two in there.

If they keep swapping teammates, the matchhmaker will never be able to provide them a fair match because it will never find out where the teams skill will be… because it will be constantly bouncing between 2000-2500 depending on which teammates are in… and that doesn’t even account for the smaller variance in individual skill variance.

And that is just on the current 1000sr restriction. In a real clan system, where the team is the only one with SR there will be no individual skill restriction, which means you could have a team of bronzes with 3 GM ringers.

You would never be able to have a good rating system for clans outside of a structured short term tournament like Open Division. Or at least not the way the current matchmaker creates matches, because the matches even at the current SR would never be equal.

Like, how would you ever find out that the 3 players that are 3k caliber… actually brought the team SR up to a true 2500… if they only moderately play (and win) in games the matchmaker set up in say 2100? All you’d have is that when the 3 higher players are in, they have a high win rate… because all they really did was prove they were better then someone at 2100… not better then the people at 2500.

Because what if you had the same win/loss pattern… but for all 9 being 2000. But in their games they just ran into teams that die easily to their strat… like maybe they run goats and just run over people, but around 2600 they run into people that start running Symm/bastion. Or the meta shifts… or maybe they just ran into lucky matchups.

Kaawumba-1133 · January 5, 2019, 12:15am

I didn’t carefully select that data. It was given to me by Des and Porkypine, who likely believed that the data showed rigging, which I disproved.

I’m somewhat inclined to simulate the matchmaker as you believe it works, and then show how it fails basic statistics and sanity checks. However, that is a lot of work and I suspect you would dismiss that work out of hand (because… reasons…), so I’d be wasting my time.

If you’d like to create a simulated bogus matchmaker (since it is so easy), I will analyze the output to look for anomalies.

If this is true, then it would be fairly easy to show with statistics, if you record your SR and hero played for every match. Basically, if you play Fill -> Fill -> Fill -> Main -> Main -> Main, there would be distortion in your win probability when you switch back from Fill -> Main. Of course, I’d probably need a few hundred games of data to be sure (somewhere between 20 and 50 examples of the event being investigated).

Until you provide this data (either real or simulated), rather than blatant assertions, I’m going to stick with the most likely hypothesis: that you have a poor understanding of how numbers, analysis, and randomness work, and are hallucinating the rigging.

Kaawumba-1133 · January 5, 2019, 12:25am

In this case, you will likely just have to ask questions and trust my answers.

Yes.

All players have a built-in inconsistency that is fixed at the beginning of the simulation and does not change.

Yes, you can. The code adds the variances at run time. If the variances aren’t changing, then I can just as easily add them together at the beginning.

My model in general assumes that players are static. If the players / teams change by less than tau (set to 24) per game, this assumption is a good one. If players / teams change by more than that, it is a bad assumption, and will make convergence slower and accuracy less.

Allowing subs in team play does make my model less accurate (and optimistic), but this compromise is necessary to keep team play functional in a real-world environment.

Open matchmaker play is never going to be ideal. People can choose who to play with, when to play, where to play, etc.