Thanks so much for pointing this out! You are correct. In a previous version of the text I normalized the calculation to count a win as +1 and a loss as -1 so that the expected value was 0. The resulting variance was 1/2, and the resulting standard deviation was 1/sqrt(2). When I changed versions and normalizations, I mistakenly carried the wrong standard deviation through and it affected all the calculations.

I made changes to the text to reflect the new numbers, and I credited you for finding the mistake. The upshot is that Overbuff is slightly more accurate than I had originally written.

2 Likes

Thank you for the recognition, but then I feel I should also ask if the standard deviation of the mean of n trials is 1/sqrt(2n).

Assuming the sample average of n trials should converge to N~(p, p(1-p)/n), in this case its variance would be 0.25/n and taking a square root from that to find the SD gets you to 0.5/sqrt(n) = 1/(2*sqrt(n)).

So then the SD you used would be a factor of 1/sqrt(2) too large and you would need to fix the numbers up again.

I mean, I think you don’t need to take a square root from the 0.5 as it was taken from 0.25 already but only from the n.

You’re right - too many algebra errors. That’s what I get for writing long posts in a hurry in the middle of the morning while working on other things. The numbers are now corrected, with some more data feeding into the total player count as well. The sizes end up being a bit smaller, but the accuracy remains largely the same due to having a smaller standard deviation.

I knew the conclusion wouldn’t change but if this thread is to be used as reference later (by me, say) the math needs to be on point. There are people who would unnecessarily harshly discredit Overbuff now.

Thanks for the fixes.

I completely agree with you. There’s a lot of misconceptions about Overbuff data and probability in general going around, and it’s important to keep the details straight. Honestly, I probably shouldn’t have written it in such a rush and should have proofread it more carefully, but then sometimes more urgent things come up. Thanks for taking the time to read carefully.

1 Like

I mean anyone with the most basic understanding of statistics knows the remaining pool of matches is more than sufficient to get a reasonably close approximation of the tested population. But this isn’t a random sample of the OW playerbase anymore. Actually it never was, it was always going to be biased by the opt in nature of these stat sites (where stats get tracked based off of games played by people who have been searched on the site before). But having an opt in on even reporting stats to begin with and making opt out default means you could be creating estimates for a dramatically different population.

For example if I was a Torb or a Sym one trick, or if I played only Mercy and wanted to branch into dps… these are situations where you have far less incentive to opt into public profiles. If you’re a DPS main who’s just going to auto-lock dps every game than you don’t really have incentive to hide a long history of playing dps because it probably is helpful if joining a LFG or whatever.

Idk why you’d focus the whole post on stuff like standard deviations of sample statistics. If you want to show the post profile change stats are representing a similar population as before the profile changes just compare pick and win rates in the two week period before and after. If the only significant changes can easily be explained by some balance changes (or if there are no significant changes) you’re good to go. Otherwise there’s argument that it has degraded the quality of previous info.

I’m not particularly invested - just surprised that you would focus on sample size of games in a game as popular as OW.

Developmental Psych major here taking stats and stats and exper design, good read OP

The Overbuff data set is opt in, and that does create a self selected population. There’s no getting around that. Unfortunately, that’s all the data we have to work with. While the self selection might influence many things in the data set, I think there’s good reason to doubt that it’s in any way strongly correlated to skill at playing any one given hero.

Regarding private profiles - I actually did the comparison you described (or at least some version of it. I wanted to be careful to avoid issues with early adapters, so I took two week periods that were themselves two weeks away on either side from the introduction of private profiles.) There was no influence of private profiles on win rates. Perhaps I should have elaborated on this further in the post, but it was getting long and my time was short.

I had several reasons to focus on the sample sizes and the standard deviations. The first was that I was curious what the order of magnitude of the sample size was.

The second was to address several claims I saw floating around in the forums, about how partial samples of a data set could not be used to make deductions about the data set at large and about how the percentage of private vs. public profiles rendered Overbuff unreliable. These are claims that I felt needed a somewhat more nuanced explanation.

The third and central reason was that I wanted some quantitative information. If you use these statistics, how much confidence should you have in them? It wasn’t enough for me to convince people that private profiles don’t matter much. I wanted them to understand what level of resolution they need to use when they look at, say, gm Mercy stats. I found that people on both sides of the question were making claims based off daily data in these high ranks, and I wanted to dissuade them from it.

I certainly don’t claim that there’s no room for further statistical arguments here, but I think that the post adds something valuable to the discussion.

Fair enough I think it adds something to the forums as well.

I didn’t think private vs public would necessarily correlate with win rate. I expected if any change occurred it would be more correlated with pick rate. If you didn’t see any changes in that in niche and unpopular picks than I think it’s fair to say we’re looking at a similar population as before the private profile implementation. The reason is people are bugged or told to play particular heroes more based on who they play than their actual particular stats.

Yeah, that’s certainly a cause for concern when thinking about private profiles but it doesn’t seem to be significant enough in Overbuff itself to affect the averages. At least not the win rate averages. The Mercy pick rates were rising during that time, so they obviously didn’t remain constant, but the increase was at approximately the same speed.