I looked at some deep reinforcement learning (RL) white papers and RL seems to be a good fit, for many of the topics you’ve discussed.
With RL, you have goal-oriented algorithms that can reach goals using multiple layers of neural networks, by weighting actions leading to the goal. In the case of Starcraft or Diablo 4, Blizzard exposes an API on their Agent (the character) for programmatic control.
An example, using D3, might be beating Belial. If the RL Agent is standing in a poison pool and dying it would decrease the weight for those actions leading to dying, eventually weeding them out. If it’s using a particular skill or timing to cast a certain skill that is used on the way to beating Belial, it would increase the weight to strengthen that skill or timing. Very simplified but the Agent can “learn” how to beat objectives by repetition and weeding out actions that don’t work and promoting actions that do work, like step-wise refinement, it just gets better and better by gradually weighting actions up or down.
On topic, with RL Agent, they could have all the items and sets and gear in the game as part of the system to be used by the Agent. The Agent, on it’s own, could use all the various sets for Demon Hunter for example to see which ones work best for which encounters, in T16 rifts, in GR120+, and so on.
Because the RL Agent is in a scripted virtual environment (with some RNG affecting boss moveset and monster movesets, etc) it would actually, based on what I’ve read, train extemely quickly by just looping over and over until it’s reached it’s success goal of defeating the rift, GR, boss etc or failed, knowing that it’s power level or gear is not sufficient to reach the goal.
By doing the above with all the classes, and items, and gear combinations, it would be able to show, as a rating metric which builds are better in a very precise way for T16, GRs, and so on. Goals for the RL Agent with T16 might include keys, DBs, gold, gems and other factors as well beyond just beating the content as quickly as possible.
As it’s going through all the gearing combos it would have on Sage’s or Cain’s at some point for T16 goals as example. And those sets would lead to strengthening the weights on using them because they yield more DBs and keys and so on.
I’m confident that this RL system would come up with a very accurate rating of items and gear in the game because it has hard data of their effectiveness at beating objectives. Afterall, it’s used every class, every set, every item, combination in the game to beat objectives. It knows what works and what doesn’t work.
So there’s your item “effectiveness” rating system - the product of an RL Agent’s work.
And there’s your tiering because the tiering is just dividing up the items and their ratings in rough tiers - those ranked highest in top tier. Those ranked lower in lower tiers. This exercice also showed that categories would be needed for T16 and GR push and PvP and so on. Different goals are best served by different gear.
So by adjusting my OP to include your AI idea in the specific form of an RL system to rate items, I’ve eliminated the balance team needing to rate items.
We now have an automated system for trading with accurate item ratings and tiers. It’s BoA, it allows free trade, and you have to have something good to get something good. There is at least, some level of RMT deterrent. There will always be RMT on some level with FT but this system dramatically reduces the incentive.
The core precept of needing something good in the first place to get something good in return, seems to hold up well.