I have seen many cases where people uses math and statistics but…
" I have 100% win rate! " (only 1 game played)
" HS rigged pack opening! " (1 leg from 20packs open)
Thus, leads me to a question…
What is the baseline for each scenario?
e.g. how many games does a deck need to be played to substantiate it’s true winrate?
how many games does it need to proof games is ____________ ?
In both of these cases, I asked about confidence intervals for the binomial distribution — usually you’d want some software or at least tables (sounds old-fashioned, yes) to calculate such things. Speaking of which:
Your question is directly related to the issue raised (see above). In case you didn’t get it, the ‘true’ answer is something along these lines: the win rate lies in the interval of (expected value +/- stochastic error) with the X% probability (confidence level), i.e. the ‘confidence interval’, sometimes measured in ‘sigmas’ — usually for the normal (Gaussian) distribution, though, not the binomial one, which I was asking about.
As for the reaction, well, what do you know? Not a single substantial answer from self-annointed (sic) forum’s maths ‘gurus’, who’d usually scrunch their noses and talk about ‘SaMpLe SiZeS’ as if they understood anything about it. Verdict: either the local audience has got no clue — or those few who do are completely silent.
Quite a few. You need enough games to capture the overlapping effects of a lot of things.
Your card draws
Your RNG effects like discovers
Your opponent’s card draws
Your opponent’s RNG effects like discovers
Then repeat those last 2 for every single deck that’s common in the meta
AND also make sure that your class matchup spread is representative of the meta’s current spread (which is often shifting).
It requires kind of a ton of games to get a deck’s “true” win rate.
Finding the odds of a legendary in a pack is a good deal easier. You just open a pack and check if there’s a legend in it or not. The only confounding thing in that is you want to make sure you aren’t hitting the “pity timer” legends that you get guaranteed after 40 packs without one.
For that, stats is pretty good at calculating the number of packs you’d need to open to be 99% confident in the drop rate of legends. It’s just been a while since I’ve done that myself.
The general idea for statistics is that you can start extracting probabilities only when you reach a really large sample size.
Large generaly starts after several thousands.
Even for something as simple as tossing a coin where you expect 50/50 on the long run, 1000 iterations can still miss the expectations.
I have this google sheet that represents it : > https://docs.google.com/spreadsheets/d/1sGYAejN6fePSVGp2-_kWNSHWLd-GtAWz2qxwBAzZogE/edit?usp=sharing
Each time you refresh the page it will toss a thousand coins, but you will sometime have a 48/52 or 49/51, even though it has been generated with equal probabilities
And the smaller your probabilities are, the harder it will be to find them back in your statistics.
What you can extract from small samples though, is anomalies. If an event is supposed to have 0.1% chance to happen, but you have a small sample where it occures 4 times in a row, it is enough to question the exactness of the claimed probability.
The brain tends to focus on short-time samples and emphasizes bad results. That’s a really common bias that is hard to spot and avoid (and also hard to not call out at wrong times)
Again in the coin toss sheet, I included the max streak of the same outcome happening in a row.
If your sample size is that one time when it landed 10 tails in a row, you will claim that it’s rigged and the outcome is always tail. Even if the big picture clearly indicates otherwise.
Think of it like a subjective win rate (your personal deck’s win rate) and the objective win rate (the actual real win rate combined of all people using that deck)
Your subjective win rate will need to be played enough times to substantiate the deck’s true win rate is X where X = the objective win rate.
That’s the only way.
Thus, you can never use your own deck’s win rate to calculate what the overall win rate of the deck is unless you are the ONLY one playing the deck.
So if deck A has an objective win rate of 55%
you can play deck A for 100 games and be at 55% and match the “true winrate” but you can also have played deck A for 1000 games and be at a 54% win rate and not match the “true winrate”.
This is why you can’t use your own deck’s win rate based on your subjective data.
And this is why there is no real answer to “how many games do I need to play a deck to get it’s true winrate?” unless you are the ONLY one playing that exact deck.
It depends what the finishing word is here. To prove the game is what?
If you are wanting the word to be “rigged” then there isn’t a set number of games. A number of games won’t prove the game is rigged.
The way you prove the game is rigged is by making a claim, and then proving demonstrably that the claim is backed. The key word here is demonstrably. And this is where all “rigged” things fall apart. No one has ever produced a claim that you can demonstrate and have others demonstrate as well. People think their personal experience is a demonstration and it’s not.
It needs to look like “If I play A and press Play I will be matched with B. You can go in the game and do that right now and we will ALL get the same results, thus proving playing A matches you to B”. And this has never happened not once. All we get is “well, it happens to me!” with zero video or recording evidence.
Bla, bla, bla. Not a single formula! Or even a slightest notion of what ‘large’ etc is, as noted above.
For starters, check something like this out: https://en.wikipedia.org/wiki/Binomial_distribution
In particular: https://en.wikipedia.org/wiki/Binomial_distribution#Confidence_intervals
and also https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
(WikiDumpster might not be the best or most accurate source, but it’s just a pointer at the proper subject and how these things work in general)
In the posts linked above, I’ve already noted that approximating the (binomial) distribution with the normal one probably isn’t a very good idea here.
It’d be an okay little project for a maths student to implement those formulae in a small program or something like that, otherwise it’d be a bit annoying, that’s why I asked. If you don’t wanna bother, well, for rather large sample sizes (although you’ve likely got no notion what that actually means) you could go with a crude approximation by the Gaussian distribution and use the ‘three-sigma’ recipe I hinted at earlier.
When you grasp at least the basics, we could talk — and the topic-starter has explicitly proposed to talk about MATHS… But, of course, those VERY VOCAL forum participants are eager as ever to contribute their oh-so-valuable clueless ‘opinion’ instead.
By the way, you’ve apparetnly got no notion of what ‘statistics’ even means.
For starters:
https://en.wikipedia.org/wiki/Statistic
(Yes, the definition there in the beginning is more or less correct.)
Ugh… That’s a… creative way to put it. You might as well go on and tell us there’s no such thing, that a ‘win rate’ is a ‘social construct’ or whatnot. It’s fine and all, everyone is entitled to one’s own opinion, but this isn’t a mathematical approach, if you ask me.
The question is: how many?
Sorry for being curt, but you’ve literally said nothing about it, despite being verbose in your post.
Nope, that’s not how it works.
As said many times, a ‘proper’ way (well, at least one of — there are probably alternative approaches) would be to estimate the confidence interval, for which the player’s own data is as good as any, really, and the amount of data is related to its width — nothing more, nothing less. The ‘objective’ figure still remains one to be estimated.
This entire question indicates the wrong attitude.
Every sample is less than the entire picture. So no sample is perfect. But some samples are better than others. Maybe it’s because of larger sample size, maybe it’s because one sample doesn’t have a particular bias that another sample has. But there is absolutely no binary line where “true” winrate is established. There are only better samples and worse samples, on a spectrum.
That said, in a world where data websites professionally collect data and for the most part make it available for the low low cost of free with ads, it’s downright silly to consider ANY personal play sample as “good.” It’s competing against samples of thousands or tens of thousands. It’s like trying to build your own car when Toyota exists.
Biases apart, it’s not so much about a ‘spectrum’, but rather about how wide the confidence interval for the estimated ‘true’ win rate is.
In the example linked above, I provided an estimate that it’s roughly proportional to the inverse square root of n (number of data points) in this case.
Not if you’re testing the theory of the game’s being rigged, for example. In this case, if you average this over a large amount of players, any such effects would be cancelled out.
Besides, one should still be mindful of things like ‘the average body temperature in a hospital’ (some patients have high fever, others are already cooling off in the fridge, but we are seriously using this ‘metric’ as an indicator of patient health on average), to begin with some silly and basic examples, or Simpson’s paradox, if you look at something slightly more advanced and mathematical, rather then pure common sense.
The issue here is that you can’t really prove a negative with data. I mean, I have said that the data shows that the game is 95+% not rigged, and that this is objective fact. And I stand by that statement… but the 95% part is pretty crucial. If the output is random and we have high confidence in its conclusions based on sample size, then we can pretty much say random output proves randomish method and no widespread rigging. But if a few dozen games out of the hundred thousand are rigged, well, there’s really no way to detect that with data. There is no such thing as a 100% confidence interval.
It’s kinda like Bigfoot. I can’t prove that Bigfoot doesn’t exist. But if you’re trying to tell me that Bigfoot is commonplace, well that would be objectively false. Similarly, I can’t prove that zero rigging exists, but it is an objective fact that it’s not common.
It’s absolutely entirely about a spectrum. Confidence interval is just a way of measuring how far along that spectrum you are, and 100% confidence doesn’t exist.
This is essentially a claim of unfalsifiability, which immediately makes it scientifically invalid. If you can’t perform some test of the output in such a way that you get one result in the data if rigged and another result of not rigged, then your hypothesis is untestable and that’s not science, that’s religion.
If the output matches random, then it’s not rigged, because any rigging that doesn’t effect results isn’t rigging at all.
‘Spectrum’ as a mathematical term means an entirely different thing (suddenly reminded be of the ‘standard variance’ incident, which was apparently an erroneous conflation of ‘standard deviation’ and ‘variance’, by the way). If you’re trying to explain ideas in your own words, that’s fine, but let’s just not confuse it with (mathematical) terminology.
By the way, technically, it does — the entire domain of the distribtution function. Not that it’s very useful practically, of course…
Anyway, since you’ve mentioned it again — that by itself is not necessarily a problem, including mathematically rational decision-making (all that Bayesian ‘magic’ and whatnot… I think we’ve touched the subject slightly already).
No, it’s not. I’m merely pointing out that, on the most trivial level, in a zero-sum game like this, ‘positive’ and ‘negative’ riggings for many players would cancel out, and you wouldn’t see the effect; moreover, amalgamation of sets might introduce a new ‘bias’ (have you taken a look at the aforementioned Simpson’s paradox? I’d highly recommend it if you’re not familiar, it’s a good one), so simply having ‘bigger data’ might not necessarily make your results better — sometimes, in fact, quite the contrary, that was my point.
As a reference, even this would do:
https://en.wikipedia.org/wiki/Simpson%27s_paradox
A side note: I’ve been berating this dubious ‘source’, and for a good reason — if you look at anything even slightly sensitive, political etc, that WikiDumpster looks heavily curated at least or utter totalitarian propaganda at worst, so I wouldn’t trust it one bit, but a number of purely technical articles in the US version have apparently been written by students, enthusiasts etc citing actual decent books and literature, thus some material isn’t that bad there (and not that it’d interest many people but ‘nerds’ anyway, so there’s no point in machinations there) — in fact, it can occasionally be useful.
PS One more thing… Can’t believe I’m saying it, but I’m really glad it’s you who showed up in this topic. On the personal level, you might be… you , but at least you can operate with facts and rational arguments (when not trolling and such, that is) — there’s that, and I do like conversations of this kind.
I didn’t say you would prove a negative. My example wasn’t to show how to prove the game isn’t rigged. That’s the negative. My example was to show how it is rigged, which isn’t a negative.
“prove the game isn’t rigged” = proving a negative claim
“prove the game is rigged” = proving a positive claim
So I don’t even know what you’re talking about when you reply to me saying the issue here is that you can’t really prove a negative with data. Yeah, I know. That’s why I didn’t say anything about proving a negative.
I said how many. Until your subjective data matches the objective data. Which is essentially meaningless because…
Yes, it is.
…your single deck experience will never be enough to make an overall case for the deck. Thus, you can’t take a single iteration of data within a whole to determine what the whole is.
So if you want a number, it’s whatever number matches the overall data. For some deck iterations it will be 100 games. For some deck iterations it will be 1000. For some it won’t get there even with 1,000,000 games. So it could be a number you reach and it could be a number you never reach.
This is basic statistics.
Given a pool of 1,000,000 participants (or whatever number you choose), some participants aren’t going to experience the average participant experience.
So asking “how many games until I see ______” Is like asking how many games does a participant have to play to see the average participant experience. There is no number. And asking for a number is just being unknowledgeable in how statistics work. You may as well be asking for what number do I count to in order to get to infinity.
And if you understand how statistics works and why this is the way it is and continue to say “The question is: how many”? then you are being dishonest. I’m going to assume you’re not dishonest and instead assume you’re just lacking understanding.
No formula, only empty words and assertive self-repetitions.
No, it isn’t.
I dunno, get a decent university degree, take some courses or whatever… There are even books after all, although it might be hard for a layman to navigate through this rather complex subject.
Then it’s a perfectly reasonable question in terms described above.
In practice, such a question is raised, perhaps implicitly, all the time. “We’re seeing a peak in the spectrum here [talking about an example from particle physics again] at ‘three sigmas’, could that really be it? — Well, let’s gather data for ‘five sigmas’, then we’ll see whether it disappears or we’ll claim a discivery”, something like that.
With enough games, it can be a pretty good measure of how that deck operates under certain conditions.
For example, you state your rank, how variable that rank is (average deviation, let’s say, not to try to dabble with exact terminology - means how much does it oscillate around your current rank) and a couple of more variables just in case (server of play, your usual finishes) which can under some circumstances impact the rank of winrate (for example fresh account/server = a lot of bots, not fresh account on old server = more precise)…
…and then you can be sure of your data more than what you read from aggregate data websites, because they’re not collected under those specific conditions only, but other conditions which impact the data’s precision.
If I can hit, for example, top 200 with sludgelock with a winrate of 59% consistently on more servers (obviously I can’t, not anymore), that means that everyone under those circumstance can switch to sludgelock and reach that same statistic with enough games, and unchanging circumstances (no meta shifting).
That’s also, btw, why I always include my rank when I talk about stuff. It’s not because I like to flex, but because I feel like those are neccessary additions to explain under which circumstances that’s possible.
I mean I do like to flex, but I rather open a new thread to do it, rather than use other topics.
Nothing vague and vacuous here, this is practical, it’s not theoretical.
You’re continuously trying to apply actuarial mathematics in a place where it has no business to be. This is some basic statistics, and it’s actually useful, unlike your failed attempts of what you clearly don’t understand if it brought you the idea that the game is rigged.
You can feel flattered I even replied to you. It won’t happen again.
Lo and behold: some anonymous guy on the forum has just claimed to have brought down the entire progress of humanity in this particular discipline, which deals exactly with problems such as this one. Well, how do I put it: I don’t think so.
Ha! ’ Remember this day, for you are graced with my presence.’
Don’t fall for the trap. There isn’t a set number. You saying “with enough games” is hinting a specific number exists. No such specific number exists statistically, and that’s what he wants you to give.
This is why I’m now avoiding this discussion further with this individual now. They are either purposefully dishonest or stubbornly ignorant. Despite giving the benefit of the doubt earlier, I’m no longer certain which now.
It’s asking an unanswerable question similar to asking someone to tell you the exact person in history that started speaking Italian. No such person exists even though the start of Italian existed. And when you say no such person existed, the person acts like they win the argument because you can’t answer it because the answer doesn’t exist. It’s one of the most lamest tactics in debates and that’s exactly what he’s doing.
It’s dishonest. There is no set number statistically.
The biggest of these circumstances is that you are you, and very few if any people are like you. It is a critical error to simply assume that oneself is normal, and that therefore one’s experience can be generalized as normal. You need to first investigate what normal is, and see to what extent you match it.
You can’t just assume that something conforms to a bell curve. I’d imagine that if everyone played the same number of games, then a random win-loss record per player would fit such a distribution… but players do not play the same number of games, so I don’t see where there’s a binomial distribution to be had.