So I created a data tier list to go along with median MMRs over the past 30 days:
https://imgur.com/xBD8uxM
Its created by dividing the highest median MMR and lowest median MMR(no tanks) of the final day into 5 bins(list of 6 numbers) then ordering them by Median MMR within each bin as well.
Its about as unbiased as you can make it in my opinion but its still early in the season and changes are still occurring alot(look at disc gains in the last 5 days).
That’s fine, I think the major impacting factor would be the top 5000 cutoff limit. If you can get around that it seems like a good way to summarize.
If you only get top 5k, you can see that as more players play a given spec, the closer the top 5000 ratings are to the right tail of the distribution. Or more formally, P_{full distribution} (rating < median(top 5000)) is an increasing function w.r.t # of players.
You can adjust for it by choosing a sufficiently high rating threshold M and only considering the portion of the distribution above M when computing median. M should be chosen to be high enough such that you have all data for players from each non-tank spec above rating M.
This has a slightly different interpretation (tier list for people capable of achieving at least rating M) and assumes that rank 1 of spec A is not below rank 5000 of spec B for any specs A,B
I see, then to ensure you have enough data maybe something like 1700 or 1850. Some rating not explicitly associated with a reward where you still get a large sample size.
dam it actually looks like a middle finger. Nice work, the only other thing that may be good to think about is median vs another quantile.
If you’re bored one question that comes up all the time on these forums is the debate about representation vs class power. There are some exogenous shocks to class power in the form of weekly patches. If you’re able to properly control seasonal trends you could probably prove that increases to class power lead to increases in representation. The trick will be getting a measure of representation which isn’t reflective of power (e.g. something like rank 1 cutoff rather than # chars over a hard to achieve rating). https://en.wikipedia.org/wiki/Bayesian_structural_time_series is my favorite way to study those sorts of questions
thats really interesting, honestly you will have to help in some way with an explanation. I am only good at slicing and dicing the data at the moment and doing simple graphs.
I have the rank 1 cutoffs each time I grab data for the current season. is there a formula/function in R/Rstudio that you know of?
In python the library is CausalImpact, it looks like there’s an R version too https://cran.r-project.org/web/packages/CausalImpact/vignettes/CausalImpact.html
For the synthetic counterfactual here, representation of tank specs may be a good one. It’s pretty rare that tanks receive pvp changes, but the influence of general game popularity/world events probably affects tank players the same. The basic idea is that you want two time series: one which is not affected by the shock (balance patch) and one which is, and such that the control time series is predictive of the test time series in the period before/after the shock.