Tempering data: Extremely unlikely to be weighted

Edit: There is a second set of data that also includes tracking of streaks of rolls halfway down the post. It’s a smaller dataset, but unbelievably uniform to expectation. Tempering is not weighted, and no obvious behavior involving manipulation of streaks occurs regularly enough to appear in this sort of data.




This is preliminary data, limited to a single category on a single class.

Caveats:
Different categories could be weighted while this one isn’t
I could have misclicked occasionally - almost certainly not enough to change the data
Items could be somehow weighted INDIVIDUALLY, meaning that this isn’t a big enough data set to account for short stretch variability
Weighting could change on each particular roll relative to the prior roll
Weighting could be extremely low impact values

I rerolled 500 single hand melee weapons on Rogue using the cutthroat category, which has 4 options. If equally weighted, each option should have a 25% chance of rolling. I used all 6 rerolls on every item, for 3000 total rolls.

Total rolls: 3000
Vulnerability damage: 818 27.3%
Cutthroat Damage: 738 24.6%
Cutthroat Critical Strike Chance: 696 23.2%
Cutthroat Attack Speed: 748 24.9%

Over 3000 attempts, this is close enough to the expected values that there is almost no chance that values are weighted to any significant degree. It is possible that they could be weighted within the range of a couple of percent relative to each other, but there essentially no reason to do that (and given that vulnerability is the “desired” stat in this category, if their aim was to decrease the likelyhood of getting the desired option, this data skews in the opposite direction).

If you would like to repeat the experiment for yourself, this is the (very crude) AHK script I used to count rolls. Please note that it takes a LOT of rolls to overcome short streaks; if you only roll a couple of hundred times or less it proves extremely little. This is even more true on larger categories.

This script in no way interacts with D4; it is purely a script that counts the number of times you press the buttons 1,2,3,4,5 and expresses those counts as a percentage of total buttons pressed. If you cannot understand what it does by looking at the code below, and are not comfortable using it, don’t use it.

Edit: I have updated the script to write each roll to a logfile called temperslog.txt so you have access to the order of rolls if that is something you’re interested in.

#NoEnv

v1:=0
v2:=0
v3:=0
v4:=0
v5:=0
v_total:=0

1::
v1+=1
Iniwrite,1,temperlog.txt,Tempers
gosub,tally
return
2::
v2+=1
Iniwrite,2,temperlog.txt,Tempers
gosub,tally
return
3::
v3+=1
Iniwrite,3,temperlog.txt,Tempers
gosub,tally
return
4::
v4+=1
Iniwrite,4,temperlog.txt,Tempers
gosub,tally
return
5::
v5+=1
Iniwrite,5,temperlog.txt,Tempers
gosub,tally
return

tally:
v_total:=v1+v2+v3+v4+v5
v1t:=round(v1/v_total*100,1)
v2t:=round(v2/v_total*100,1)
v3t:=round(v3/v_total*100,1)
v4t:=round(v4/v_total*100,1)
v5t:=round(v5/v_total*100,1)

tooltip,Total: %v_total% || V1:%v1% %v1t%`% || V2:%v2% %v2t%`% || V3:%v3% %v3t%`% || V4:%v4% %v4t%`% || V5:%v5% %v5t%`%,1,1
return

3000 rolls, while not absolutely positive proof, should be enough to put doubt aside from all but the most tin-foil wearing of conspiracy theorists that there is almost no chance that any of the caveats above apply.

Here are the two reasons people perceive a potential weighting of categories where one almost certainly doesn’t exist:

  1. You reroll every time you see a negative outcome, and never when you see a positive outcome. This means that you actually see every negative outcome on a given item, and you only see one positive outcome on every individual item, even if behind the scenes, the likelyhood of every outcome is the same.

  2. Pure and simple, confirmation bias.

I realise that a great many people will say “of course the values aren’t weighted, why did you even bother doing this”, but to those people I say, go and read any of the many, many threads on tempering and you will see handfuls of people who are convinced on the matter.




Edit: This is a post from ~190 posts into the thread containing a second set of data on sorcerer frozen mastery category, including the raw data and streak counting. It’s only about 10% of the size of the first test, but frankly even at that point the data is so normal it would hard to be moreso. There’s nothing here; the bugs and behaviors people are ascribing to the system simply don’t exist in these data sets. Either they are confined to specific characters or accounts, or else they are so deeply malicious that they only occur in very specific circumstances to screw with very specific people at specific times (ie: they don’t, take off the tinfoil.

Anyway, this is the post:

Ok, another ~276 rolls in I am stopping, because there is absolutely zero indication that ANY form of bug or streak manipulation is occurring. If there were some form of indicator that there might be, I would probably be inclined to keep going, but there simply isn’t.

https://docs.google.com/spreadsheets/d/e/2PACX-1vQADvlALisXIuI1l0EF4FkAUwzTDyTQUD_f2SVmBoLZ3rB4Zpo_ffMo6j0VUDf9gnB034u6L69Nom9Y/pubhtml?gid=0&single=true

There is the spreadsheet with the raw data and the streak calculations. You can look at it yourself.

Here are the final numbers after 276 rolls (46 items, somewhat ironic given that guy’s claim he did 47 some time yesterday :P)

Total rolls	       277	%rolls
1: 2x Frost Bolt	74	26.71
2: 2x Frozen Orb	70	25.27
3: 2x Icy Shards	67	24.19
4: Blizzard Size	65	23.47
Streak of 2	47
Streak of 3	11
Streak of 4	5
Streak of 5	2
Streak of 6	0

Streak of 2 on single item	46
Streak of 3 on single item	9
Streak of 4 on single item	4
Streak of 5 on single item	0
Streak of 6 on single item	0

TL:DR; on these 46 items, I rolled streaks of 2 of the same roll 46 times, 3 of the same roll 9 times, streaks of 4 4 times, and zero streaks of 5 or 6 of the same roll on the same item (or even across multiple items). These are all counting mutually exclusively (ie: a streak of 2 doesn’t get double counted when contained in a streak of 3). There’s simply nothing here. Could it be still bugged and I just got really lucky for 46 items worth of data? Sure. Is it likely? No. Not a single thing about this second data set stands out as untoward in any way given an unweighted distribution and no funky streak functions or bugs.

52 Likes

Thanks for putting your time into this!

Nice work. Sadly people are probably still going to insist on there being a weighted algorithm. People just have a hard time accepting random events. They’ll always try to add reason to it. Couple that with, as you state, confirmation bias, peoples general lack of knowledge with basic math and probabilities and boom you have yourself a conspiracy to keep us playing longer.

7 Likes

I accept these results, but still would like percentages listed by Blizzard in-game.

Blizzard has a poor prior record with the Enchantress by first denying weighted affixes, and then saying “oops, our bad! Let’s fix it” with Enctanting 2.0.

5 Likes

Nice numbers. I hope they are accurate. Maybe my luck will average out over time, but isn’t that a gamblers fallacy :stuck_out_tongue:

I’m not sure the point of Blizzard listing them in game if you don’t trust Blizzard in the first place. It would be especially weird if they aren’t weighted; you’d just have 25% listed after every value.

But sure, it would be good if Blizzard categorically stated that this was the case, even if people are inclined not to believe them on face value.

Apparently you can’t reply too many times in a row, so I’ll add my replies to the below responses here :stuck_out_tongue:

Well yeah, I mean every value would simply have 1/number of options.

You certainly could, it still would be extremely unlikely to be weighted if categories were reaching into the <20% or >30% values, but it just didn’t in the case. :shrug:

On the first 500 rolls or so vulnerable damage was sitting somewhere in the range of 40%, for interest’s sake.

2 Likes

Im surprised at the results. I would have thought youd see more of a varience.

Thanks. This should be pinned to all the posts :cry: about tempering and making claims of weighted blah blah.

Not all Tempers have 4 possibilities. Some have 5 or 3.

The gambler’s fallacy still describes a true outcome, it’s just that by and large the gambler doesn’t have the resources to reach equilibrium. Also, typically gamblers are in a position where they are at a weighted disadvantage in the first place.

3 Likes

Gamblers also bet against the house and, well, the house always wins.

2 Likes

I would agree that the temper rolls in this manual are weighted evenly. However, since each manual is different, this proves nothing about all the temper manuals. My guess is they are not weighted but the code sucks. That would be consistent for Blizzard. They just aren’t good enough to weight the manuals and get it right. Maybe they are and it is a conspiracy, but with all the other systems that have math bugs, I tend to believe they just don’t “math” well at Blizzard. Others experiences, including mine, with tempering are likely due to bugs. Thinking Blizzard did it intentionally is giving them way too much credit.

4 Likes

Literally addressed in the caveats and in the conclusion. If you actually believe this to be the case, feel free to run your own data on a different category. It’s absolutely possible; it’s just extremely unlikely. This would mean that Blizzard decided to weight different stats in different categories, but randomly decided that in this particular category that I just happened to pick, the stats were equally valuable with each other.

2 Likes

Yes, I was reiterating. Didn’t mean to imply it wasn’t mentioned. My apologies if it came across that way.

1 Like

I would be curious to know how many times you rolled the same temper in a row? To me that seems to be the biggest complaint with people seeing 3+ of the same temper again and again. Also along those lines did you have and rolls that repeated for all 6?

I ask that question because there was a bug at some point with enchanting where some items would randomly get locked on one possibility and only the value would change. The second enchanting choice however functioned normally.

I didn’t record each individual result so I don’t know; either is potentially possible without upsetting the overall trend of the data. I don’t remember getting a single item with the same result 6 times, but in 3000 rolls it’s possible I just didn’t notice it.

1 Like

That’s interesting. I’ve come to the conclusion it’s doubtful they weight the rolls as well, your data seems to support that at least in the sample you’ve done and I see no reason they would be doing it in other tempering rolls.

This is a great post.

I’ve added the capacity for the script to write to a log file, so you will get a file called temperslog.txt in the same directory as the script is. This will record every roll in the order they occur, which will allow for you to record streaks.

I’m not redoing the experiment, but others might be interested in it.

#NoEnv

v1:=0
v2:=0
v3:=0
v4:=0
v5:=0
v_total:=0

1::
v1+=1
Iniwrite,1,temperlog.txt,Tempers
gosub,tally
return
2::
v2+=1
Iniwrite,2,temperlog.txt,Tempers
gosub,tally
return
3::
v3+=1
Iniwrite,3,temperlog.txt,Tempers
gosub,tally
return
4::
v4+=1
Iniwrite,4,temperlog.txt,Tempers
gosub,tally
return
5::
v5+=1
Iniwrite,5,temperlog.txt,Tempers
gosub,tally
return

tally:
v_total:=v1+v2+v3+v4+v5
v1t:=round(v1/v_total*100,1)
v2t:=round(v2/v_total*100,1)
v3t:=round(v3/v_total*100,1)
v4t:=round(v4/v_total*100,1)
v5t:=round(v5/v_total*100,1)

tooltip,Total: %v_total% || V1:%v1% %v1t%`% || V2:%v2% %v2t%`% || V3:%v3% %v3t%`% || V4:%v4% %v4t%`% || V5:%v5% %v5t%`%,1,1
return


If you want to restart the test with new data delete or clear out the log file.

1 Like

I did my own experiment a few weeks back when I was still playing. I started tempering by alternating which of the two manuals I wanted something from. As an example, I wanted damage and blizzard size on a wand. I’d roll " Damage to Distant Enemies" Even though I wanted damage, I’d move to the other affix first. I did not get “Increased Blizzard Size”. I rolled again and got the same affix that rolled previously. I then moved back to the damage roll affix. I didn’t get it but it was different than the one that was there. I then went back to the other affix that had repeated, this time I got Blizzard size. The item never got the damage roll and while not bricked, wasn’t as good as what I had. I did this for 4 wands and eventually got the affixes I wanted. However, I was always able to prevent a duplicated affix roll from showing up a 3rd time. I had experienced 4,5,6 repeats of the same affix many times before trying this. Not proof, but my gut reaction is somehow alternating between the two affixes tends to break the chain of repeats. There is possibly some connection between the two affixes you are rolling. Perhaps switching between which affix you are rerolling resets something in their code. This is why I am leaning toward a bug in the code for the over abundance of so many “same affix” rolls.

Putting my engineer hat on for a minute, you will generally initialize random number generation with a seed value. If you always use the same seed value, the random numbers generated will always be in the same order. This is why it is always good to set a random seed value first such as the current system time. So if they are generating a seed value for each affix, it is quite possible they reuse the seed value generated for that affix when it first rolled. If they do not modify the seed value, and reseed the random generator with it, it will give you the exact same value it did before.

So it appears to me that alternating which affix you roll regenerates a seed value while staying on the same affix and rerolling seems to produce quite a few consecutive “same values”. Random is random. The odds of so many people experiencing so many long runs of the same value leads me to believe something is causing the same seed value to be used to initialize the random number generator and the first value output will always be the same.

4 Likes

What happens with some other rolls is there is a throw in stat that has no relevance to the other rolls watering down the pool and thinning out the chances of a desired result. There is one added to sorc Pyromancy Endurance which only has 3 possible outcomes, with flame shield duration being the best option. The other two options are the warmth passive which nobody uses but still relevant to pyromancy and the 3rd option is lucky hit 5% chance to heal for 873-1173 life.

That last choice is an obvious throw in to water down the results and a terrible roll as well.

Barbarian has one as well in demolition finesse. Kick damage, charge damage, and death blow damage make sense. Damage while iron maelstrom is active is pure trash meant to reduce the chances of rolling what most people probably would want from that which is death blow with its other synergies that can multiply its damage.