Glicko / Trueskill instead of ELO for SR?

Hi all,

This goes out to the actual developers of the SR system of Overwatch. But others with experience in statistical analysis of strategy or similar games will also find this discussion interesting.

I will try to make this sound a bit less technical for everyone to understand, but I feel this is a very important change for a game like Overwatch to adopt, and make the competitive SR ranking way more intuitive, as well as, mathematically rigorous.

It is a well known problem with ELO ratings that they are based of on a constant variance term, say, V - a term which signifies how accurate a player’s rating is. Of course, such a term is far from constant in reality. A player returning to the game after a long hiatus will have a larger value of V. Also players who have not played enough games to determine their ranking accurately (hence 10 placement games instead of just 1 or 2) would have a larger A. However, a player’s skill level could be extremely high even if they haven’t played the game for too long. Or, a strong player might have a second or third account. Or a weaker player too a long break from comp and started practicing a lot in quick play / training / custom games and ended up getting really strong in a few months. In such cases, it is necessary for the SR system to correctly gauge their performance and place them in the correct skill bracket as quickly as possible. Unfortunately ELO doesn’t achieve that.

Since a (proper) matchmaking system exists in Overwatch, one other problem with the ELO system is somewhat circumvented. Though, it is still not mathematically accurate to say that two players with around the same ELO are of the same skill. This is because, a player’s skill variance is an important factor missing from this calculation.

Also, a thing like ELO hell - if it exists - exists only in the ELO system.

Glicko/Trueskill are systems where a player’s skill rating is calculated accurately considering a variance term which decreases with more games player (hence, the rating becomes more and more accurate) and increases slightly with stagnation. Strong / returning / quickly improving players can therefore, climb up the ranks faster (1000 SR to 3000 SR may be possible in 5 games). With the SR resetting every two months, this works out great as the rankings towards the end of the season truly reflect a very accurate portrayal of the true skill levels of all players. Even if the SR doesn’t reset, the variance can be set to not go down a minimum value, thus, allowing for steady progress and at the same time keeping the rankings as accurate as possible.

Lichess uses Glicko in their chess rating system. Trueskill was developed by Microsoft for their ranking systems in Xbox games. ELO was/is always adopted in games with some sort of SR system due to legacy reasons, but it’s time we moved away from ELO and adopted a more rigorous system considering more aspects than just wins or losses.

I’d love to hear your thoughts and opinions on this!

6 Likes

Um, I’m pretty sure Overwatch has an uncertainty value in its MMR calculations that changes. I think that’s a given for most ELO-like system. Developer posts have implied before that your example where you don’t play competitive for a while is already handled the way you want; it applies more uncertainty if it’s been a long time.

4 Likes

I think that more accurate matchmaking could be achieved without having to “re-invent the wheel” so to say. Matchmaking could be much much more accurate but the problem is that the actually SR range of the match grows the longer you’re in queue for it and it tries to compensate at the same time with some higher ranked players on each side or enough to get the match average near your sr / average group sr.

In short, more accurate = longer wait time.
This is true no matter how you calculate a rank, the more accurate ranked matches are the longer they will take. ^^
So the conversation in the office at blizzard isn’t “how do we make matches more accurate”, instead it is “Is it worth it to have more accurate matches”.
Which idk, I personally don’t really care but everyone is subject to change. Say we have an average wait time of 6 minutes per match and then the high ranked players have to wait 8 minutes or more. Is it worth the extra time?
every matches you’ll have spent around one match worth of time inside of the queue screen.

The original Elo system has an uncertainty term, called the K factor.Elo rating system - Wikipedia It is larger for newer players and lower ranked players.

Overwatch also has an uncertainty term. It is larger for new and recently inactive players. “Play a lot of games, it (MMR) gets more certain. Don’t play Overwatch for a while, it gets less certain. The more certain the matchmaker is about your MMR, the less your MMR will change in either direction based on a win or loss” --Scott Mercer (Overwatch Forums). The statement, “You go on a large win or loss streak, it gets less certain” is no longer valid, as win streak bonuses (and loss streak penalties) have been removed from the game.

Overwatch’s system resembles Trueskill strongly. See How Competitive Skill Rating Works (Season 9) for a more complete explainer.

3 Likes

Just from the ovbvious problem with matchmakers already flawed math.

Using a simple number spread , say we ranked everyone in plat 1-10
a team comp of 2,2,2,8,8,8 would average 5.
a team comp of 1,3,5,6,7,8 would also average 5.

Now, computer logic would say these groups are the same. But we know that this skillset is not entirely linear. Whatever their mains are, whatever their team comp compatability is, we dont know this. So 5v5 looks GREAT to a simple computer setup, it plays out AWFUL In reality. From the baseline, I would assume as a human, the 2nd team would win, less assumed overall handicap, with more people relatively near the same top. But who knows, because SR is a joke.

1 Like

And Blizzard, as a company, has billions of data points to actually determine the likelihood of either team winning based on, you know, lots of actual data.

They SPECIFICALLY say that they don’t simply average MMR in groups. There is no reason to think that large MMR variations within teams occur without a pre-made group interfering with the available pool of players.

Like Kaawumba said, if you read what the Devs have said about the system and what the makers of Trueskill write about the system, it sounds nearly identical.

They did make some additional things for QP because there isn’t a competitive ruleset and the games themselves aren’t necessarily fair in QP, Attack vs. Defense, players leaving, etc., but those factors don’t matter in comp mode.

The system literally isnt doing that however. It creates a baseline coefficient based on win/loss and >Thats it< They aren’t UTILIZING all the data for match making, otherwise you wouldnt have a team of the same Dps mains all fighting for the same character. The less pre-Comp games you play in QP, the less it is able to pinpoint their made up number on where to put you, but if you clearly look at places like overbuff you can see all these wierd spikes exactly at tier change points, because its slamming people from random skillsets together to make teams, which causes INSTANT infuriating steamroll matches at those points. That is not a system that is actively utelizing all its data on those players to set up comps, that’s a system dumbing it down to 1-2 variables and going ham with it. Everyone who knows anything about matchmaking systems knows these flaws, and actively questions why we are stuck with this system.

I agree.

There’s also the factor of how consistently a person plays. What’s the point of having a value precise to 10 digits, if a person can only play with a precision of 2 digits. You know?

There’s no point in adding queue time to make something more accurate if it’s too accurate to begin with. Almost any pairing will generate information that can be used to rank someone. The trade off isn’t accuracy, it’s game quality vs. queue time.

1 Like

THANK YOU

SOMEONE BROUGHT UP THEIR BUSTED ELO

Please, Blizzard, use Glicko. Even Pokemon sims use Glicko.
Get with the times.

You average 1.4 elims per life on soldier and barely 2 on a no skill hero like sym, while playing at 1.1k SR. Do you honestly believe it’s teammates holding you down and not your awful stats? You’d be at least diamond if you weren’t in “elo hell”?
There are too many people like you on these forums and I honestly can’t believe how delusional most of you are.

It’s never “how can I improve” but “how can my teammates improve” with you people. :thinking:

This my favorite quote.

It’s a good thing that no developer has ever said that the system is actually an Elo system. That’s just shorthand from the gaming community to describe all ranking systems in general.

SO…proof that Elo hell doesn’t exist in OW…because it can only exist in an Elo system.

Thanks everyone for your replies. As most of you mentioned, the devs have incorporated a term similar to variance in their calculations, although they still call the rating system ‘ELO’ (where this term is kinda constant, and set to 400, if I remember correctly). I still don’t think that this is the case though, as the SR increase and decrease, at the start of a season, is pretty consistent with how ELO works and doesn’t at all resemble Glicko or Trueskill.

If some of you play chess, I’d recommend you try out lichess and see how your rating fluctuates rapidly initially, and then stabilizes. The world champion can thus, play one single tournament and reach his true rating almost immediately as opposed to the real chess system (ELO) where he’d need to play thousands, if not more, games to reach his true rating.

I don’t know why you had to somehow try to counter my proposition by looking into how awful of a player I am. The point of the discussion was Elo vs Glicko. Never did I claim that I’d be a GM using Glicko. Honestly, my own SR doesn’t bother me. I am not a gamer. I played like a couple of FPS games my whole life and I like the mechanics and feel of Overwatch. I have a PhD in Computer Science and have done a lot of research on skill rankings in strategy games (These techniques are not limited to skill rankings and can be used in a variety of scenarios where quantitative ranking is difficult but pairwise comparisons are easy), and hence, I thought it’d be good to throw a suggestion in the air and incite some thoughtful discussions.

Given the current state of Overwatch match making, I’d be happy experimenting with ANY new idea. I also detest the hidden MMR mechanism.

You say this, but you do not seem to have put much research into Overwatch’s system.

Where do the developers say this?

In Overwatch, the rapid movement only happens for new or inactive accounts. There is no reset at the beginning of each season in Overwatch.

1 Like

We don’t know which system Overwatch does use, because Blizzard won’t tell us.

But I am certain that Blizzard is never going to use Truskill, because they would have to license it from Microsoft.

Grasping at straws my dude. The op just destroyed your agrument with intelligence and smarts and your just sitting their as an annoying git gud crusader who doesn’t understand the original point of the post.

Licensing agreements between software companies happen all the time, ESPECIALLY with MS. I’m not sure why you would think this? It would be a lot cheaper and better than making one from whole cloth, especially since the intellectual patents would likely overlap.

I use to play on that simulator. I use to play OU tier in top 500 consistently and everytime I opened a new account I would go 30-0 until I got a 50 win percentage once I reached 1700 elo (my natural elo/top 500 elo).

With mmr in this game there is no way that would happen. I would automatically get placed with people who are just as good after 10 or so games and while I will get to my true rank eventually it would take way longer than it should.

Blizzard does to make games more fun and fair across the ladder but its at the expense of people reaching their true rank as fast as possible.

I think blizzard has a decision to make on their competetive game mode.

Do they want competitive play where matches are as fair as possible but it takes longer to get your true rank.

Or do they want genuine ranked play where people play for ranks and fairness is ignored and the better player always wins against weaker player.

Because Blizzard likes to develop such stuff in house, this is just their way. I mean, for ow they developed their own 3d engine, a stunt not many companies are capable of.

1 Like

The absolute steamrolls people continue to experience are very much an effect of ELO based matchmaking.