I’ll start by saying I agree on the paragon/gem limit cap for some PTR testing. However, it should be noted, that some builds only become viable at high paragon due to the asymmetric scaling of GRs and damage/toughness from primary attributes. So in capping the PTR, these builds would go unnoticed and could be problematic in higher GRs. For an in depth example of this asymmetric GR scaling, see my post on Dexterity scaling vs GR progression.
Next, as a mathematician, I always appreciate players using math to support their decisions and opinions. However, there are a number of flaws in your appendix:
- Is the data truly Gaussian? (I think actual player data might be skewed, with difference variances for each class)
- Does the data consider “Barbarian” and “Wizard” with a single curve for every possible build?
- How is paragon incorporated into these graphs? If a class is weaker, they generally obtain less paragon per season than other classes unless they have multiple characters.
These simplifications make it almost impossible to tell the whole story with just a few plots. Do you have access to the recent (current era) and exact data (specific distribution by class, build, paragon, etc.)?
As someone who has analyzed D3 game balance from a PVP perspective, I am confident it is not possible to balance the game with flat damage and/or toughness flat percentage buffs/nerfs.