I just want to describe how Vicious Syndicate and HSReplay collect data. Because there have been some claims in this thread that the collection is biased.
HSReplay
What HSReplay does is it allows people to download a program that records data. This program has algorithms that (try to) identify both the recorder’s deck archetype and their opponent’s, then saves winrate data from the recorder’s point of view. It counts both recorder and opponent archetype towards popularity.
This method of data collection is roughly accurate, but it has flaws. Recorders win more on average than a random sample of Hearthstone players, particularly at lower ranks — so winrates as displayed on HSReplay are higher than they actually are, especially at low ranks. This can be extremely misleading for people who visit the site without purchasing a subscription, as the only free rank filter is Bronze through Gold — in other words, the free Standard data is of little value at best, and misleads viewers into fits of rage at worst. At higher ranks — which require a subscription to be viewed — this warping effect is reduced as the average opponent’s skill comes closer to equalling (or perhaps exceeding) the average recorder’s.
Additionally, because the recorders are half of the games and the recorders are not a random sample of deck archetypes, the results are slightly skewed by their deck selection.
It’s possible for paying customers to be misled as well. Paid filters allow viewers to narrow data down to the past 24 hours, or to the past 3 days, in addition to the standard time filters of 7 days and since the most recent expansion or balanced patch. In particular the 1 day filter can have misleading results, reducing sample size to the point where this archetype or that can have one particularly lucky or unlucky day and seem considerably better or worse than it actually is. This is further compounded by the flaws described above.
The main advantage of HSReplay is that it updates continuously with a 7 day filter.
VS Data Reaper numbered reports
Vicious Syndicate created a process that circumvents the key flaws of the HSReplay process described above. VS only uses one piece of data from its recorder’s side of recorded matches — the winrate of their deck archetype specifically in the matchup against the opponent’s archetype, also known as matchup winrate. VS only uses opponent archetype, not recorder archetype, in determining archetype popularity — so if recorders queue against random opponents, that should be a random sample. Furthermore, instead of counting winrates directly from recorded matchups, they calculate “expected winrate” indirectly by taking all of the matchup winrates and using a weighted average weighed by archetype popularity to estimate overall deck winrates.
The VS data collection method is overall far more accurate than HSReplay. It doesn’t inflate winrates as much because the recorders aren’t overrepresented as much. It presents an almost flawless depiction of the popularity of archetypes. VS Data Reaper numbered reports cover roughly 7 days (often starting roughly 8 or 9 days before report publication), so they cover a good sample size. These numbered reports are basically the statistical gold standard.
The main disadvantage of VS numbered reports is that, being carefully curated, they are not continuously updated.
If VS numbered reports have a second flaw, it’s editorializing. The authors add explanations of the data that can exaggerate or understate the truth within the numbers themselves. In the vast majority of cases these exaggerations are quite slight, but it’s common to see third parties quote them and then exaggerate their points a little bit further, and so on, like a game of “telephone.” I’ve often seen untruths about the meta perpetuated based not off the numbers, but the words of the VS team. However, the raw data is impeccable.
VS Data Reaper Live
VS also has a live reporting option. Data Reaper Live reports according to the last 24 hours of data collected, using the same basic statistical standards described above. However, for calculation simplicity, it only uses the top 15 archetypes, and because it runs off estimated winrate this means that decks that aren’t in the top tiers don’t have their results factor into the winrates of the top 15 decks at all.
Furthermore, Data Reaper Live inherits archetype definitions from the Data Reaper numbered reports, instead of the other way around. Because these archetype definitions are not quickly updated by the VS team following a new expansion or miniset, Data Reaper Live tends to use outdated archetype definitions from the previous meta in trying to identify decks until the first numbered report comes out. This essentially makes data unusable during that timeframe.
Even under the best of conditions, VS Data Reaper Live data is deeply flawed for the purposes of balance change discussion. It’s based off just the past day, which makes it just as vulnerable as the 1 day filter on HSReplay to the coincidence of archetypes having lucky or unlucky days. And unless the meta is particularly un-diverse, only considering the top 15 archetypes in calculating estimated winrate can lead to misleading results — for example, at the beginning of Alterac, more than a third of the meta was decks not in the top 15.
In my opinion, Data Reaper Live is almost worthless, ranking below a properly filtered HSReplay. It may have some narrow utility if one specifically wants to know today’s metagame as opposed to yesterday’s — and it is better (and cheaper) than HSReplay’s 1 day filter, if that’s one’s aim, particularly at top Legend where archetype diversity tends to suffer. But it’s all but useless at describing a metagame as a relatively static entity, particularly at lower ranks. It’s the equivalent of day trading stocks instead of analysis of overall market trends.
I hope this wall of text proves enlightening to someone. Also, thanks to RidiculousHat for generously giving his time to clarifying many details of VS processes, and apologies if I got any part off.