Too long, didn’t read
If your SR over time looks like “Skill Rating vs Game, CDF Statistics” at Win/Loss Simulation and Data - Google Sheets, that does not mean that Blizzard is using broken matchmaking. Random variables, on the other hand, are not your friend.
This is a repost from the legacy forums, as they were shut down, and this post still applies. Since then, I created a much more expansive simulation, posted at https://www.reddit.com/r/OverwatchUniversity/comments/aatezy/why_match_quality_is_frequently_poor/.
Introduction
There has been a great deal of posts lately implying that because there are win/loss streaks, and win/loss streak reversals, there must be some sort of illicit Blizzard meddling going on. In general, these posts show a lack of understanding of how random numbers work, which links such as https://wizardofodds.com/image/ask-the-wizard/streaks.pdf show. However, that link and others are not explicitly focused on what goes on in Overwatch, so I did a more relevant simulation. The graphs are at Win/Loss Simulation and Data - Google Sheets and the text description is below.
Coin-flip statistics
First, consider a case where a player has a 50% chance of winning, and a 50% change of losing each game, and has played 1000 games (starting at 2500, +25 for a win, -25 for a loss). This is coin-flip statistics. The player’s SR over time will look like the plot “Skill Rating vs Game, Coin-flip Statistics”. Note that even though the player’s starting rating is 2500, it dives all the way down to 1200 before working its way up. If we attempt to ascribe narrative to this, we would say that there was a huge weight on the player for ~500 games, which was then somewhat lifted. However, since we can look at the code, we can verify that there was no such effect. It was only random coin flip chance. We also know that there is really no such thing as “true rating” because each game is independent from the past. Running the simulation many times leads to random final SRs.
Not only does the overall trend line go through large swings, but there are also many streaks, as “Streak Frequency, Coin-Flip Statistics” shows. Streaks of up to 6 are common. Streaks of up to 15 occurred in this simulation.
The final plot shows the autocorrelation function, which measures “Assuming a player won game 0, what is the probability that a player won games in the past and the future?” As determined by the model, we get the expected result: There is no influence on future or past games based on the current game. (If a win at game 0 guaranteed that you lost game 1, then there would be a spike to -1 at x=1.)
Cumulative normal distribution function statistics
Of course, it isn’t really coin flip statistics. As a player goes up in rating, wins become harder. As he goes down in rating, wins become easier. I did a second simulation, in which win probability is determined by a cumulative normal distribution (mu = 2500, sigma = 500), as shown in “Win Probability vs Skill Rating, CDF Statistics”. Put simply, if the player’s rating is 2500, his win rate is 50/50. If the player’s rating is 5000, his win rate is zero, and if the player’s rating is 0, his win rate is 100%. There is a smooth s-curve between 0% and 100%. This modification fixes the problem of random SR drift, and the trend over time averages at 2500. However, there still are large trends down and up. From game 425 to 575, the player gains more than 400 SR. He then falls all the way back down in about 125 games. Throughout, there are many large and small swings. However, the underlying math has never changed, and his “true rating” remains 2500. Streaks without interruptions are slightly shorter, but still frequent and long, as the plot “Streak Frequency, CDF statistics” shows.
Here the autocorrelation looks basically the same, indicating no correlation between games. In fact, though, if I run a simulation with 100000 games instead of 1000, there is a very slight negative correlation. This corresponds to the increased difficulty in winning as rating goes up (and decreased difficulty in winning as rating goes down).
Real Game Data
Thanks to Porkypine and Des, I was able to analyze the game data at OW ELO hell - Google Sheets and compare it to my model. This is shown in the third and fourth row of charts. All of the charts look the same as the CDF plots, within the limits given by the error bars. Win Probability vs Skill Rating is consistent with a cumulative distribution (or a straight line, for that matter). Skill rating vs game shows the same sort of motions. Streak frequency has the same fall off. The win/loss autocorrelation function is one for the same game, and zero (within the errors of the measurement) for other games, which means that win probability is not based on past games, and blizzard is not taking into account past games (including streaks) when matchmaking in any measurable way (outside of wins leading to slight difficulty increase and losses leading to a slight difficulty decrease because of rating change).
If anyone has more data that they would like me to look at for weirdness or otherwise, I would be happy to do so.
If anyone would like to play with the code, it is at win_loss_simulation.m - Pastebin.com and win_loss_analysis.m - Pastebin.com
You will need Matlab with the statistics toolbox and curve fitting toolbox to run.