Ret and Fury: Superior parses in the 99th and 100th percentile == superior scaling? Or not?

In the discussions around Ret and Fury relative performance, one idea has been presented that on the surface has merit.

The idea advocates looking at parses in the 99th and 100th percentile to capture those optimal players who have the best gear which should represent scaling performance.

This has been applied to comparisons between Ret and Fury where at the 95 percentile and lower Ret is outperforming Fury but Fury retakes the lead at the 99th and 100th percentile.

Example shown here:

100th
https://classic.warcraftlogs.com/zone/statistics/1017#dataset=100&sample=7

99th
https://classic.warcraftlogs.com/zone/statistics/1017#dataset=99&sample=7

95th
https://classic.warcraftlogs.com/zone/statistics/1017#dataset=95&sample=7

However, I think this analysis is flawed as it doesn’t account for classes with a higher gear independent RNG peaks in damage. Classes with a set percentage chance to proc abilities like Fire Mage and Fury. The 99th and 100th percentiles will always select for the lucky proc runs giving them much better outcomes than in the lower percentile ranges. They have a higher peak and the 99% and 100% parses capture that peak. These should be treated as outliers and not indicative of class scaling.

How do I prove this? Isn’t it just one conjecture verses another?

Currently yes. However, lets set up a hypothesis to test the theory that Fury just naturally has higher peaks.

Hypothesis:
If this is true and Fury peaks higher and the much better performance in the 99% and 100% is not due to superior gear in that bracket then we should be able to see the same thing occur in end of phase Nax where pretty much everyone has phase max gear.

Lets see if that is the case:

As expected - in the 95% Fury come in at 17th:
https://classic.warcraftlogs.com/zone/statistics/1015/#region=1&dataset=95

At 99% Fury are still at 17th (though there is a sharp jump at this point):
https://classic.warcraftlogs.com/zone/statistics/1015/#region=1&dataset=99

And at 100th Fury jump up to 15th:
https://classic.warcraftlogs.com/zone/statistics/1015/#region=1&dataset=100

Sample sizes in the Nax samples at all ranges are much larger than the ones we have for one week of Ulduar (covering the Ret buff period). But even so there is a clear bump in performance relative to other classes in the 99th and 100th percentile with actual rank changes in the 100th percentile.

This provides clear evidence that Fury has an intrinsically higher RNG damage cieling compared to many other classes that is independent of gear scaling. Rather than simply scaling better with better gear as some have conjectured with this weeks Ulduar results, the Ulduar parses fit the same pattern as end phase Nax.

Conclusion:
Results confirm Fury has higher RNG peeks and that higher percentiles are not sufficient data points to provide evidence for Fury superior scaling with gear.

Note - This data doesn’t do anything to disprove the conjecture that Fury scale better with gear, it only provides evidence that the 99th and 100th percentile parses can’t be used to accurately show scaling performance. I also think the use of 99th and 100th percentile data to approximate scaling has been done in good faith, but it is a fundamentally flawed approach.

Further analysis can be done to bolster or weaken this case by analyzing the number of parses for each gear bracket in each of the percentiles to determine if there actually is a significant gear difference between the 95ers and the 99ers.

Tldr:
Selecting only from the 99 and 100 percentiles is selecting from outliers and is a misuse of the data set. There is strong selection bias for good rng in the 99 and 100 percentiles.

Just look at the statistics based on ilvl. All percentile while looking at 235-237 ilvl fury is 1.5k dps ahead of Retribution overall.

At around 200ilvl, ret is doing better, but as the ilvl climbs (which suggests better players as well) fury dumpsters on ret.

In Nax that was pre buff.

In this I’m not arguing that Fury don’t scale, just that the focus on the 99th and 100th percentile is not a convincing way to measure scaling.

Ilvl comparison is a much better way to put the scaling case - as you’re doing.

Post patch Ret is ahead of Fury in the 95% at the highest measured gear bracket. Though it’s a small sample. I wouldn’t call that conclusive. For example only 3 Fury parses in this range.

Essentially we don’t have the data yet to measure scaling in Ulduar in a meaningful way.

No im talking about Ulduar now. At around 200ilvl ret on all percentiles is around 4.4k and Fury is at like 3.3?ish But increase that by 30+ ilvls and Ret is only doing about 7.9k and Fury is doing around 9.4k.

People were trying to say that Fury would be under Ret now with these ret buffs until ICC.

Again you’re not arguing against the position I am making. I agree - ilvl comparisons are a better way to measure relative scaling performance.

Honestly, I just want to see people stop using the 99th and 100th percentile parses to prove something they really don’t.

I wish I could find a girl to look at me like OP looks at ret parses…

Maybe some day.

1 Like

To be fair I only have eyes for Fury - and my focus is on Fury parses ;p

My slight of hand here is that my post isn’t really about Ret, it’s about Fury.

They are using those to simulate the ilvl jump. Those in the 99th and 100th percentiles using all ilvls would be the higher ilvl players.

Sooooo…

Which they don’t accurately do. Using the 99th and 100th percentile parses in this way is very misleading. Especially when using Fury as an example - which we have established has a higher than average RNG peak in damage output.

Ret is already ahead of fury with the buff and it has been 5 days. Look at logs.

But is it misleading?

Yes, yes it is.

We are and fury is scaling better than ret with the ilvl buffs.

Sure because it’s early into the tier.

The point is ret won’t scale which we knew and fury will scale which we knew.

I don’t understand the point of the thread though?

Are you just trying to show with a very small sample size that fury outscales retribution?

If so I’m pretty sure everybody knew this already.

I’m simply trying to show that the popular usage of the 99th and 100th percentile parses to demonstrate scaling is not a useful metric. That’s it - that’s the whole point.

Actually, not trying to show it - succeeding. It’s debunked - read the op again to see how.

Kelliste
Came back with ilvl comparisons - great, that’s a much better metric (though we don’t yet have enough data on it). If that’s how this discussion progresses I consider my job done.

My desire is to ground my opinions in fact, and not simply accept “collective wisdom” or assumptions. I’m happy to change my views. So, the aim here is to improve the quality of discourse.

I’m not trying to win the argument, I’m trying to contribute to establishing a stronger fact base from which to argue.

My actual opinion? I think it’s too early in the phase to make any definitive claims about Fury’s relative scaling performance based on current data sets. I don’t know and I don’t think you know either.

Further to this:
The assumptions you need to make for the dramatic performance peak in the 99 and 100 percentiles is that either:

  1. A very minor difference in player behavior is having a disproportionately impactful benefit on Fury’s relative performance such that larger skill improvements in lower brackets are not having this effect.Seems a pretty big assumption without any evidence to back it up.Or
  2. That there is a sharp jump in gear performance such that the rise in relative performance kicks in from the 99th percentile on and not as a gradient.Again a pretty big and unfounded assumption.

Making these types of assumptions to explain the performance jump is nigh on using numerology to explain the volume of a cup of water.

The simpler explanation is the hypothesis that Fury have a higher RNG ceiling and that this is disproportionately captured in the best of the best parses. Analysis of Slam proc rates confirms this. The rest is explained by the interplay between more crits in the heroism window and rage gen.

In short - using outliers as your basis is mathemagics not mathematics.

The 95 percentile seems a better quality sample to use - but not to the exclusion of all else. Trends should be analyzed as just that, trends over multiple percentiles.

If we want to assess scaling for the phase we should look at performance at various gear brackets specifically. We don’t have large enough samples to do that at the higher gear brackets yet - so we need to wait until we do, rather than make half baked claims as if they’re facts. Better to say you don’t know than to use mathemagics to “prove” the unprovable.

I’ve had a look at the parses of the last week by gear bracket and it does look like Fury do quite well out of it (and Ret not so much), but it is way too early to tell. The sample sizes at higher gear brackets are extremely low.

I mean you can even disregard the outliers using statistical methods. Real, rigorous math. Even a 2-tailed significance test (which isn’t as valuable when comparing top-end performance, you’d likely use a 1-tailed test) would show exactly your point, that they are significantly different enough to not be considered part of the normal population.

1 Like

Can, or should ;p

And yes - the Fury results at the 99 and 100 percentiles are different enough to the trend across the rest of the population that they should be considered outliers. Any self respecting statistician would.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.