Your question is wrong. The CLT states that as the sample size goes to infinity the sample converges on normality. You can’t get an infinite sample so a better question is what size of sample is needed for a high confidence of being normal, or is it possible to test if a sample is normal with a high degree of confidence?
Note that normality is an ideal condition and not necessary. The mean of the sample is the first thing to converge and generally does with a fairly small sample size, followed by the tails which generally don’t fill in until very large samples.
The sampling distribution is the distribution of a given sample. That distribution can vary from the population distribution which is what the sample intends to measure.
As an aside, you guys don’t have an idea of how things get more complicated when you try debate these things in a language that is not your mother language. The differences of putting a word before or after sometimes get a bit dizzy…
If we have a population of size 100 and we take a sample of size 40 then we have 100 choose 40 possible samples right? Each possible sample has unique sample statistics, such as the mean of that specific sample, in other words the sample mean is a random variable, which is different from the population mean (parameter)
The Central limit theorem states that if you were to plot the mean for every single possible sample, the distribution of the sample means is approximately normal (given that we have a sample of size 40 here) REGARDLESS of the distribution of the population.
Nope, that’s a T distribution. The CLT says that the normalized sum of a random variable will trend to normality as the sample size goes to infinity. Random variables from a T distribution abide by the same rule which is what you describe. You have the parent population which you can sample and the mean of the sample can vary so the distribution of the sample’s mean is distributed according to the T distribution.
Other random variables can be normalized and aren’t necessarily a T distribution. T distributions arise from sampling what is suspected to be a normal distribution in the population, but in this case the parent population is logistic because that’s what the Elo algorithm uses. It’s a moot point because the random variable still converges to normality according to the CLT regardless of the parent populations distribution.
Its gonna take a bit. More than anything because due to covid I’m working with quite a damaged laptop for this and can’t access my good PC, so I’m gonna have to see if this toaster can actually process the dump without burning. I’ll make a thread or something when I’m done.
I’m actually not 100% sure about this, because the Elo rating of any given player is based on the sum of points won/loss, which is in essence a random walk, and random walks have infinite variance so your actually more likely to get a massive outlier if the matchup is random
That’s in essence what this is, but the players are sorted according to their average elo rank so this gives you an idea of how much their mirror varies from their average.
When you sort by mirror elo the curves look like this:
Even if it has a massive outlier, I don’t think it will make that the standard deviation grows compared to the sheer number of players. And also, there are factors such as skill being still present albeit in smaller measures or human fallibility that stops the outliers from getting out of hand.
My point: there are not enough cases (players) that have happened such that the extreme outlier (player that happens to win every pvp by seleting a random cheese or whatever he feels that day) to happen. And even in that situation, the nature of the ELO system makes it that the more it is an outlier, the bigger ELO difference to what it beats to it gains less points and it grows out of control faster.
Its early in the morning, so there are some ramblings there but I hope I get my point across.
Pro players dont need balance team to consider them. They can get money from E-sports. If the sample include too much pro players win rates data, common players will leave the game, beginners too.
So you want more friends or…?
No actually I think you were completely right, at first, since Elo’s are a random walk, if we make the matchup completely random then I would expect the distribution of Elo ratings to be much tighter.
In a sence, pro player is balance tester. They are the solutions to imbalance. They want more money, they must find the best way to deal with imbalance. Commom players learn their build from E-sports video and stream.
Mirror elo predicts performance with almost 90% accuracy. So if there is variability it will disappear very quickly even with a small sample of players. This data-set has about 500, so the odds that the average P/T happened to have their 10% variability above their mirror elo and the average Zerg happened to have their 10% variability below is incredibly unlikely.
Yes, also if the best player won every mirror matchup, its growth as an outlier would be simply far and above anything a random walk could provoke. The best player would grow to the limit of games played, the second best player would win every time unless he played the first, etc, making the outliers much more extreme.
I’m downloading postgresql right now on my toaster to actually process some data for once. All of this has sparked my curiosity.
Also yeah, the distributions for all mirror matchups seem to be pretty much the same. And I’,m willing to bet that if you plotted ELO vs number of players for all matchups, you’d get some nice normal distributions.