If sim iterations values are not normally distributed after 1000s of iterations (which is why you run so many iterations in the first place) then it’s lost meaning at that point. If I have 1000 iterations run but the distribution is negatively skewed, are you able to determine that the mean of said sample is accurate to what you are asking it?
Again as many people stated, numeric values such as representation numbers or sim values are not meant to be taken in a vacuum but many of your arguments have been doing so. Not to say the representation numbers are meaningless persay but a lot of the assumptions you make based on that data is a bigger stretch.
An example of this I can give is examining representation numbers of keys done higher than a 20 is more a statement about over-game designing. Enhance’s lack of defense is meaningful because the game is designed to have a large number of 1 shots and high single-damage events instead of rot damage due to how healing kits function (further blame can be pointed towards how talents work now as well). But representation data in general is going to be skewed towards ret anyways because people just like to play Ret regardless of performance and Enhance is historically not popular. To give you an example of this in Mythic last season, Enhances overall was performing better than Ret yet they still had a similar number of parses. Looking at it in a more targeted manner, there were 877 Enhance parses for Mythic Sark vs 1275 Ret Parses looking at 10.1.5 data.
The logic that you have been presenting so far in this thread would say that Ret is somehow better spec in multiple facets on Mythic Sark and there are huge problems with Enhance Shaman when in reality Enhnace was 2nd from the top and Ret was 4th from the bottom in damage. Yet Ret had about 400 more parses for the fight. Turns out Deus Vult lives on regardless of performance.