Mercy 1 Year Later | Was a Rework Really Necessary?

Arceus-2238 · September 18, 2018, 2:32am

Nah, Overbuff is still accurate and reliable.

As far as I know, most of the feedback here is that Mercy is now underpowered.

I mean, this nerf is a royal flop. It completely ignored the statement that came along with the nerf.

AG3-11710 · September 18, 2018, 2:34am

Most people say not.

shiinotic-1889 · September 18, 2018, 2:34am

I agree. Tbh she reminds me a little of Mistweaver.

Imo rez should go. I don’t know what to replace it with (my go-to is Kharazim’s Divine Palm with some tweaks). But bringing back mass rez is the opposite direction we should be going.

Of course, because the forums are skewed and it’s a lot of the same 5-6 people. Fortunately, this isn’t the only place they collect feedback from.

Doesn’t really change things in terms of how they make decisions.

It’s accurate in terms of it having a reasonably large sample size, but it’s never going to outdo the actual dev statistics. Which is, fortunately, the set of stats the devs actually look at.

Arceus-2238 · September 18, 2018, 2:35am

Does that matter? People have already proven the contrary.

On Mercy and data

This post was conceived as a companion piece for evaluating my previous post (A month of Mercy nerf in numbers). Other than that, it’s not really going to be about Mercy (though I use her to work out examples). It’s about everyone’s other favorite subject - math. More specifically, I want to address the question: how much trust should we place in Overbuff statistics?

I’ve seen a lot of people claim that Overbuff’s numbers are meaningless due to private profiles (or due to the fact that even before private profiles Overbuff only received statistics from a subset of the player base). This claim is false. You absolutely can make very good claims about a large data set by sampling a large piece of it at random.

Here’s an example. A very big bag is full of blue tokens and red tokens. The bag is shaken thoroughly to mix up the tokens. Then you spill 100,000 token on the floor. Of these, 60,000 are blue and 40,000 are red. The remaining tokens stay in the bag (in the Overwatch context, these are the “private” tokens). What percent of the tokens in the bag are blue?

You can’t know for sure without looking at every single token. What you can do is make claims, and then give probabilities that they’re true. The bag should contain about 60% blue tokens and 40% red tokens. If you claim that the bag has exactly 60% blue tokens, then you are probably wrong. If you claim that the bag has somewhere between 59%-61% blue tokens then you are almost surely right (with a probability that can be calculated and shown to be very very close to 100%). The more tokens you draw from the bag, the closer the percentage should be to the actual percentage in the whole bag. In math, we call this the law of large numbers.

You end up replacing a certain claim (exactly 60%) with a more nuanced one, together with a measure of how sure you are about it. This shouldn’t worry you. Your entire life is lived like this. For instance, your brain processes information from your eyes at a certain frame rate, and your eyes themselves have many missing pixels (so there are “private” pixels and “private” moments in time). Everything you see at all times is based off of probability. That’s the way almost all of human experience works.

The situation in Overwatch is almost identical, except we’re not sampling entirely at random. For instance, we’re only sampling people with public profiles. The good news is that there’s excellent reason (supported by data analysis) to believe that private profiles aren’t correlated with, say, skill at playing Roadhog. So the conclusions still hold.

So, now on to the main question that I wanted to discuss. How much should you trust Overbuff data? Like in the bag of tokens example, it really depends on how many games Overbuff records. If you drew 2 tokens out of the bag, your conclusions would be off. If you drew a trillion, they would be really good. So, how many games does Overbuff record?

We can’t know exactly without someone from Overbuff chiming in, but we can get a very good estimate. The thing that comes into play here, and will be important for the discussion later on, is a beautiful theorem called the central limit theorem. I won’t state the theorem itself, since it’s a bit too technical for a forum post, but it deals exactly with the kind of situation we have here - repeating a random event over and over again and trying to understand its averages. The central limit theorem is the source of the bell curves that you might have seen pop up in all kinds of aspects of your life.

Suppose you record N games in which Mercy was played (I told you there would be a little bit of Mercy here). Suppose that Mercy is balanced such that she should win about b% of the time (and b% is pretty close to 50% so the standard deviation is close to 1/sqrt(2)), but that in those games she won about a%. How far apart can we reasonably expect a% and b% to be? One thing that the central limit theorem (and its companion, the Berry-Esseen theorem) says is that you should expect about 66% percent chance for a% and b% to be within about 100/sqrt(1.4 N) of each other (i.e within one standard deviation).

So I looked at the time period between late June (when private profiles were introduced) and early August (when the support balance patch came in) and found several characters whose pick rates were stable (Mercy was one of them). For each day, I found out how far the win rates for that day were from the average win rate during that time period. Then I found the percentage that 66% of the results were smaller than and solved for N.

Now, this is only an estimation but the upshot of it is that Overbuff records around 100,000 games a day. Using Blizzard’s statistics about player distribution (yes, I know that player distribution isn’t the same as the distribution of the number of games per day, but they’re close) we get about: 8,000 games in Bronze, 21,000 in silver, 32,000 in gold, 25,000 in platinum, 10,000 in diamond, 3,000 in masters, and 1,000 in gm every day.

Now back to the central limit theorem. Say we are trying to understand Mercy’s win rates in all ranks. Mercy’s pick rate in all ranks is currently 7.32% so she appears in 7.32% * 6 * 100,000 = 43,920 games. The central limit theorem tells us that we have about a 66% confidence that her average is within one standard deviation of the actual average. This means that we have a 66% level of confidence that the average in overbuff is at most 0.4% away from the actual average. It also gives about a 90% confidence that the average is at most 0.8% away. That’s about the level of trust we should have for a daily win rate in all ranks in Overbuff.

In diamond for instance, we get a 66% chance that her daily averages are within 1.4% of the correct average and a 90% confidence that they’re within 2.8%. That’s not amazing. But, we can improve it by quite a bit. The first thing we can do is to limit the direction - we have a 84% confidence that the correct average is not MORE than 1.4% above the overbuff average for that day (in this case I don’t really care about it being less) and a 98% chance that the correct average is not more than 2.8% from the Overwatch average.

Better, but still not great. So, what do we do? We take more games. We calculate the average win rate over a whole week (if you want to be accurate, you should weight each day by the pick rate). This ends up being 7 times as many games so the accuracy improves. When you do this, you get a 84% confidence that the actual average is not more than 0.5% from the Overbuff weekly average in diamond, and a 98% confidence that it’s not more than 1% more than it. And if this accuracy isn’t enough, you can average over a whole month. This will give you an 84% confidence that you’re not more than 0.25% too low, and a 98% confidence that you’re not more than 0.5% too low. Not bad. In all ranks, the monthly average has about a 84% chance of not being more than 0.073% too low, and a 98% of not being more than 0.146% too low.

In Masters and gm, you really shouldn’t be paying attention at all to daily numbers - only to weekly and monthly averages. Especially for characters with low pick rates. Mercy’s pick rate in gm is about 2%. That means it only measures about 100 Mercy games a day. That means that you only have a 66% confidence to be within 8.3% of the correct number (or a 84% chance to not be more than 8.3% too low). That’s really bad. However, if you average over a week then you are 84% confidence you’re not more than 3% too low, and if you average over a month then you’re 84% confident you’re not more than 1.5% too low. Similarly, in Masters if you average over a month you’re 84% confident that you’re not more than 0.85% too low. Note: you have to be careful when averaging over a month. The days with higher pick rates should count more than the days with lower pick rates because they represent more games. And if this level of confidence isn’t enough? Wait another month and average. This will divide your 84% and 98% confidence numbers by about 1.4.

The numbers given here are just guidelines. They’re based off of estimates and approximations. But the math behind them is very real. They’re good estimates and guidelines. There are other places to go from here that can increase the confidence levels even further. For instance, if the character you’re interested in spends more day below a certain number than above it, that increases your confidence that the number is low. Or, If the character you’re interested in spends most of their days very close to a certain number, that increases the likelihood that this number is close to the correct average.

I hope this helps people interpret data in future discussions (and not only Mercy based ones).

IsTheMedia-1302 · September 18, 2018, 2:36am

From what I’ve gathered;

Pre-Rework Mercy; No mid-fight potential but end-fight potential within Mass Rez. Although it didn’t help a team sustain itself in the middle of the fight and was more or less a reset, and a way to extend the matches; it still felt like is was something that could contribute to the fight.

Valk 1.0; MASSIVE Mid-fight potential; however completely unbalanced with how long it lasted and everything Mercy could do in that state. Mercy had more power for the mid-fight than most other supports who THRIVED in the mid-fight.

Current Mercy: No Mid-fight potential once again. Heals that even at 60hps couldn’t hold against focus fire. Damage boost only when it’s a guaranteed win, that’s not mid-fight again that’s end-fight and weaker than what Mass Rez was. (Pre-fight? Without another healer to ensure your team and sustain itself during the push it’s not all that useful).

Mercy currently has no pre-mid-end fight potential anymore.

Lazypeon-1469 · September 18, 2018, 2:36am

I think mass rez can work with serious tweaks to her kit and how rez functions, but I’m also totally open to scrapping it as well.

My main issue is, I think her rez is a big part of her identity. I don’t think that’s something that should potentially be ignored, it’s one of her major idetifying features.

AG3-11710 · September 18, 2018, 2:37am

Private profiles exist.

Arceus-2238 · September 18, 2018, 2:37am

Omg. It’s obvious that you didn’t even read the post I linked.

Lazypeon-1469 · September 18, 2018, 2:38am

Private profiles also haven’t limited the data set enough to make a significant difference to my understanding.

It’s still a guessing game either way though.

AG3-11710 · September 18, 2018, 2:38am

I really didn’t need to.

Arceus-2238 · September 18, 2018, 2:38am

And you bothered to respond? Even though you didn’t read the post disproving the whole Overbuff isn’t reliable argument.

shiinotic-1889 · September 18, 2018, 2:39am

For better or for worse, scatter was a pretty big part of Hanzo’s identity–even featuring in a cinematic. But it was scrapped.

The issue is that now that it’s gone, it just shouldn’t make a sheepish return. The longer it gets, the more of a Bad Decision it is for Blizz’s PR and players’ opinion of them. Players will say “why’d you wait so long just to bring this back and btw we still hate the mechanic mercy was the most OP hero in the game for almost a year for no reason”. Blizzard will look like a dog with its tail between its legs. It’s a lose-lose.

Edit: And I’m not just saying that because I hate mass rez and don’t want it back lol

AG3-11710 · September 18, 2018, 2:40am

Yawn. I really don’t care about what you posted.

Arceus-2238 · September 18, 2018, 2:40am

Don’t respond then.

Madjack-2585 · September 18, 2018, 2:40am

It’s a subjective matter of course. To me a hero which gets most of its fun factor/ perceived impact from its ultimate is bad design. The rework just amplified this by giving her an ultimate which is just a stronger version of her basic kit. Soldier is actually a bit similar in that regard. I guess though that killing things can make quite the difference on what is considered fun.

MizuKun-11878 · September 18, 2018, 2:40am

That quote still echos in my head every time they nerf her.

AG3-11710 · September 18, 2018, 2:40am

It’s because I have a problem with your response.

Arceus-2238 · September 18, 2018, 2:41am

How? You didn’t even read it.

AG3-11710 · September 18, 2018, 2:43am

Let me read it then.

Arceus-2238 · September 18, 2018, 2:43am

Take your time.