How Competitive Skill Rating Works (Season 11)

SR accuracy

There are number of ways to approach this question. One is to start a completely new account, and then play 100 games on the new account and see how it performs compared to the old one. This shows that SR can vary by 1000 SR in extreme cases, and 500 in normal cases (27). There is some evidence that reroll experiments show less variance at higher ranks (28) likely because there are less random variables, such as smurfs, throwers, and inconsistent play.

Next, any player can see how his SR changes during a season. A range of 500 is completely normal here.

Off-topic, but…

To be fair… I do not want the government to over-involve itself with video games (again). Some parts of the world are kinda cucked from playing certain games because its not low enough on a rating to pass and what not. I’m all for being somewhat anti-lootbox shenanigans and scams coughBattlefront 2cough, but yeah…

+/- 250 in normal cases. +/- 500 in extreme cases.

That’s interesting, in that context, do you have any idea/explaination for my experience? Being that every season I have played seriously I would slowly climb up to 3400 or so only crash down to 2200 or so, and the cycle would repeat again.
My SR diagram looks like a sawtooth.
I do tend to climb just a little higher every time, pushing my ath just a little bit every cycle, but such huge SR waves can’t be good right?
To me that just signals that the SR in this game is wildly inaccurate.

I would really like to know what causes this behavior, or at least understand it.

1 Like

The amount of swing varies based on play style and hero choice. Your profile is locked, so I can’t see your hero choice. However, it is common for players with such high swing to be Mercy mains. Mercy is a problematic hero, because she depends so much on her team. Carrying with Mercy is difficult at best. This means that most games you are flipping a coin to see what teammates you get and whether or not you will win or lose and you can only occasionally flip the outcome with your own skill. This leads to higher variance in skill rating.

As to early season being easier than later season, I have noticed the same thing, though my swings are less dramatic. I have theories, but I have no way to prove them.

  1. One possibility is that casual players play early season till they get frustrated and quit. This leads to skill creep over the season.

  2. Another possibility is that high level players play their mains early in the season until they reach their skill cap, and then they boost up smurf accounts. That means that anyone who belongs in plat, for example, will have a significant head wind.

  3. Late season people have reached their competitive point goals, and are more likely to screw around, leading to more randomness in matches.

Whatever is going on, two month seasons reduced the problem from where it was with three month seasons.

1 Like

I’d like to add something about the difference between accuracy and precision, if I may. The common usage mixes the terms but to discuss what I want to discuss I need to separate the ideas.

If I were to tell you that my location was California, that would be accurate, but not very precise. Similarly, to say that your skill is between 2200 and 3400 also is likely accurate, but not very precise.

You are talking more about precision. The question then becomes one of the interface between you and the measuring system (SR). If you were to try to find me, I could likely give you my address and that would be sufficiently precise to accomplish your task. What I couldn’t do is to just give you my zip-code. You’d never find me that way. That wouldn’t be precise enough. On the other hand, I could also give you a 12 digit latitude/longitude coordinate, there are literally billions of those coordinates that would accurately describe my position and any one of those would work as long as I didn’t move.

If I was walking around my house, or even doing yardwork, by the time you got to the 12 digit coordinate it would likely be wrong. You’d be better served by the address.

Kaa’s experiment in reference (27) assumes that the precision is 4 digits and thus discusses accuracy in terms of the fact that the two accounts were 476 SR different. He’s not wrong (I mean, SR is 4 digits, it’s most reasonable to conclude that that precision is meaningful) but he would be able to say that the system gave the same (accurate) results if both accounts ended up in Gold tier. If we consider that SR is likely over-precise then the question of accuracy become a bit different. How different is 476 SR really? The answer probably depends on where that 476 range is on the overall scale.

This is where your hero and playstyle come in (and perhaps the other factors Kaa gives). If you, personally, aren’t that consistent or you play in a way that isn’t that consistent, or you play a hero that relies on factors beyond your control, then what you are doing is very much like moving around your house while someone is looking for you with a 12 digit coordinate. You keep updating the number, but a few games later it’s wrong.

Measurement systems should never be more precise than they are accurate. You wouldn’t want to use the 12 digit coordinate to find me in my house. It would only be right by sheer dumb luck. If you look at coordinate x,y but you only find empty space, you are simply inaccurate. Wrong. You of course want them to be meaningful, using a zip code wouldn’t work, but generally it’s important to match your precision to your desired accuracy.

The SR system fails dramatically in this respect. I’ve never heard someone that claimed they could tell a difference in skill level below a 200 SR difference. There are 5000 SR levels currently, so we could just take 5000, divide it by 200, and get essentially 25 meaningful tiers of game play in the MOST PRECISE version. According to Kaawumba’s experiment, we should take that 5000 and divide it by 500 to 1000. Consider if it is meaningful to have an SR difference of 1. If not, there probably shouldn’t BE the possibility of an SR difference of 1. It means absolutely nothing.

In a sense, we already have these tiers. If you forget the over precise 4 digit number and understand your rank only in terms of tiers, I think you would find your results to be a bit more accurate but THEN you have to consider not just YOUR consistency, but what “2500” or “Platinum Tier” even MEANS. The 3 possibilities Kaa just posted are all variations on the same theme, which is that the skill level that “2500” corresponds to may potentially change throughout the season. It’s a ranking system. A RELATIVE number. You’re not a Platinum player, you’re a player higher than the people in Gold but not quite as good as those in Diamond. It may seem like a pedantic difference, but it will help you to understand.

So even it the system was perfectly accurate and precise to 4 digits and you played perfectly consistently, it’s still possible for your SR to go up and down through the season. Not that it necessarily would so dramatically, but it could certainly be a factor.

Hope that helps you to understand it.

I play almost every hero, my most played hero is phara though.

SR is very accurate. On my original account (this one) I have only made it above <500 once. On my other account I am high silver low gold at the start of the season. So within 2000 SR means it’s accurate, right?

1 Like

Playing too many heroes can lead to higher volatility as well. The more consistent your play (including hero choice) the more stable your rating. It would be easier for me to analyze if you make your profile public. You may find that you have low win rates with certain heroes, and play them too much.

Please post from your other account, and make sure that its profile is public.

I continue to gather data.
Have a look at teams SR difference here:

https://docs.google.com/spreadsheets/d/1TBdpG3ahtD31QZ0Xn6HMygxruuIM1285n2cHCNHsNwg/edit?usp=sharing

The tendency clearly has changed from my previous comment about the issue. I am not playing at my usual SR right now, thanks to insane loss period. My hypothesis stands - matchmaker manipulates your win chance by shifting team SR. It actually pushes me up now - the games are MUCH easier and coordinated, than they were even 3 days ago. Explanation - team SR difference is in my favor almost constantly.

This is really good. For a multi-million dollar company Blizz doesn’t like to properly post statistics or documentation regarding Competitive all around or Character Information. The hero gallery should have a breakdown of the damage amount of every ability and standard damage per character. A small gif showing what the ability does. (I saw new D.va’s that didn’t know anything about DM) Your entire post should be pinned to the top of the Competitive Forum or have a Blizzard version made, but the mods are too busy deciding if some slightly argumentative post needs to be locked.

1 Like

Again:

You have enough data so that we can start to see this in action.

Here is your data from season 11:

As your team SR becomes higher [lower] than your enemy SR, you gain less [more] points on a win, and lose more [less] points on a loss. The trend lines are clear, though noisy due to the effect of performance SR. The trend lines are also fairly slight, reflecting the rather slow change in expected win percentage with differences in SR.

If I divide this into quadrants, you have:
Top Right: Wins where you were expected to win: 23 +/- 4.8
Bottom Right: Losses where you were expected to win: 24 +/- 4.9
In theory you should have won more than you lost, but the error bars are big.

Top Left: Wins where you were expected to lose: 21 +/- 4.6
Bottom Left: Losses where you were expected to lose: 32 +/- 5.7
It seems you have lost more than you should have, but the error bars are big.
Maybe you are tilting when you see that you are underwater with respect to SR. But the error bars are pretty large. Maybe you’ve just been unlucky.

Breaking these numbers into bins of 10 SR:

I’d love for this plot to have the clear expected trend line, but it doesn’t and the error bars are way too big. But hey, if you’d like to play a couple hundred more games and give me the data, I’ll happily crunch it. I’ve wanted plots like these for a long time, and this is the first time I’ve seen data that has even started to approximate what Scott is describing.

Your data representation is very informative, thanks for the effort.
By “you were expected to win(lose)” you mean games, where SR difference was in my favor? It seems, Scott has something like that in mind.
Your second graph clearly shows, that:

  • Either PBSR bonuses and team difference bonuses are insufficient to compensate for SR differential.

  • Or it is a huge mistake - to allow SR differential past 10-15, because average players, like myself, suffer from it.

I strongly argue for both. For example:

https://imgur.com/a/Driy7vg

Two games, same heroes, same SR. Team SR difference actually, I just can’t find other words, robbed me of 6 SR. Was it necessary to create a game with so much difference in team SR in the first place? My personal performance was obliterated, but I know I was very good in both matches, on fire most of the time. Second match actually had worse stats in K/D and overall eliminations. But I got rewarded for it. And this is most obvious example. Another one:

https://imgur.com/a/kfmtwd4

Matches without SR difference (1 or -1 isn’t much). One with insane performance is lost. What’s the bonus? NONE. One with good performance is one. What’s the bonus? at least 4 SR. What is the incentive to put effort into games??

I’ll continue to update my spreadsheet, the link is permanent. You are free to use the data as you see fit, but I would love to get mentioned somewhere near your analysis of it.

Yes.

The limits on win probability is between 40 and 60%. That seems restrictive enough to me. I don’t know what SR differential that represents.

Enemy_SR_B - team_SR_B - (Enemy SR_A - team SR_A) = 20 SR. Reading from my first plot, that means a expecting win percentage difference contributed ~20 * (.0697) = 1.4 SR to your change in SR upon victory. Since there was actually 6 SR of difference, the majority of that likely came from performance metrics. I haven’t been able to quantify performance metrics, with substantial effort. It’s certainly not as simple as K/D, and the developers have said that “on fire” is not how they derive the number. See my original post, “Performance Modifier” for more information. Performance metrics generally just show up as noise in my plots. It shows up here as a significant amount of scatter in the first plot.

A “neutral” game, with no expected win differential, and no performance modifier, no other funny business, gives +/- 24 SR (at low diamond, probably true here as well). So you gained 2 SR due to performance on your loss, and gained 3 on your win. So, more than none. But making the performance modifier bigger tends to reward stat hunting rather than winning and team play. Again, see the “Performance modifier” section in the original post.

Because you like winning? Because you’ll rank up if you win more? Because putting effort in makes it more fun and valuable to you?

Thanks. I always reference my source data, but sometimes it is behind a link or two.

@kaawumba I got my NiteOwl alt during this last sale. I haven’t played on it yet (I moved recently, don’t even have a desk right now) but before I relegate it to my “I know I’m playing like crap right now” alt I’d be more than willing to run an experiment or two. Just let me know. So far zero games played in any mode. Honestly, I don’t even think I’ve logged into OW with the acct. I bought it and shut down my computer…

Probably the most useful thing for me would be if you try-hard when doing placements and up until you start gaining/losing ~ 24 SR per game. Maybe 30 games, including placements. Do it all in one season. Also write down the ranks (gold/plat/unranked/etc.) of each opponent and ally. Unfortunately, with private profiles, you can’t really track everyone’s SR reliably.

Essentially, you’d be adding more data to this: Initial Competitive Skill Rating, Decrypted and this Initial Competitive Skill Rating, Decrypted - Google Sheets. Though as I’ve said, you can’t track SR of other players anymore, so I wouldn’t bother with going through profiles like I did.

I don’t care how you level to 25, but let me know what you do.

LFG adds unknown factors (I’m currently either exploiting the LFG system, or finally getting the recognition I deserve, or LFG just uniquely suits my play style, or being lucky, not sure which), so please avoid that for the 30 games, and solo queue only.

Thanks.

1 Like

One other thing: it’s easiest to record team and enemy SR and ranks by taking screenshots (printscr on windows). Just don’t be doing something on your other monitor when trying to take the screenshot or it won’t work.

Kaawumba, I have 171 games now in my spreadsheet. Could you please update your graph for SR change мы SR differential?

Here you go:

I made the histogram in Matlab instead of excel this time, to grant a more power and flexibility.