Yes, we agree on this.
I never said it was. This is how you get accused of strawmanning; my argument has nothing to do with the coefficients always being positive, but the relative importance of healing between the specs in relation to the rest of their toolset. And that relative importance changes based on key, key level, and even group composition.
While it would be potentially possible to have all of these factors modeled correctly, this gets into where your own arguments fall apart. You’ve insisted that the singular factors you’ve outlined would allow for perfect balance, but they simply cannot. Interactions between far too many factors exist to boil balancing down to a single one. For any singular combination of factors a single tuning knob could be leveraged to reach perfect balance, when any of those factors has a different value, you will no longer have perfect balance. And with the sheer volume of interactions and the power behind some of the combinations, you would go very far away from perfect very quickly.
Neural networks don’t even have a 100% rate of identifying whether a photo contains a dog or a cat, and that has been one of the standard use cases of early NN research so the models have gotten quite domain specific. Yet I’m supposed to believe a system with more variables and nearly infinite permutations of factors is supposed to reach 100% because…?
AlphaGo and AlphaZero are both playing turn-based games with 100% mappable states and symmetric gameplay and interactions. Each player has access to the same number of pieces of certain types, those types do the same thing for both players every time, and there is no randomness. If our computers were more powerful today, these games could be 100% brute forced by game engines. These are not even in the same realm of complexity as mapping out perfect balance in WoW, and honestly attempting to do so really hurts your credibility.
AlphaStar is a closer analogy for sure, but there are two key differences that you’d have to demonstrate aren’t going to be deal breakers for bringing to WoW.
First, Starcraft is a game heavily reliant on actions per minute. Even suboptimal strategy can be successful if you act faster than your opponent. The only realistic limitation in how quickly any computer will be able to act in Starcraft will be a game engine limitation. A computer player could potentially net positive results with an incomplete or inaccurate strategic gameplan if it simply issued 4 times the number of actions than their opponent, something that a computer would easily be able to do.
Second, and more importantly, the measuring stick for AlphaStar isn’t perfection, it’s simply to be better in a head-to-head situation than a human. AlphaStar could make mountains of mistakes in its positional evaluations yet still win because of the human player having even worse understanding of strategic possibilities and execution. This is a vastly different task to achieve than ensuring all permutations of group compositions reach the same level of content completion at all levels and circumstances. AlphaStar is impressive and I’m certain there are enough similarities in the NN that it would be a good starting point, but the success criteria is so different than what you’re suggesting be achieved for WoW that the existence of one does not guarantee the success of the other.
And yet regardless of whatever solution the NN decides upon, it will still be a solution better than any human developer. Hence the point of the proposal.
The NN converges upon a the local minimum solution, not the best solution, however the local minimum solution is the best possible solution within reason (whereas the best possible theoretical solution is usually only a hair better than the local minimum solution, and thus not worth the computational power to derive).
-
Do you really think that human developers can triumph over AI in 2023 in numerical balancing? Serious question.
-
If do not think that human developers can trump an AI in 2023, then why do you refuse to use the AI for this purpose?
- Yes. Human players have been finding ways to break the game that human developers weren’t able to predict or prepare for. If a human can find a way to break it, they will break it and until an AI can predict every possible outcome or interaction from non-existent or incomplete data, an AI will never be able to balance it.
With hourly updates, any such exploit would be fixed within one hour, which is why the OP demands hourly updates. They would run out of new exploits within a few hours.
The data also isn’t incomplete. The OP demands a full accounting of all player performances in M+ into an internalized and omnipotent and private blizzard database. This data base records a total of three numbers: Damage Dealt, Healing Done, and Key Complete (binary value of true or false) for each player for each instance. That’s the only info you need for each key number.
Hourly updates, even if hotfixes that don’t require restarts, won’t be able to keep track of everything. Tracking the highest-end players to determine whether or not the data is even valid to balance around, keeping track of instances where someone who is not a highest performer discovers something that ‘makes’ them a highest performer makes a question of “Exploit or legitimate” such as bear druids soloing high keys last season, do they get nerfed for performing too well or was there an unforseen interaction somewhere?
Also, the concept of tracking healing done is a joke as anyone who understands content knows that healing done parses is a joke (I’m a top 100 World Parser for Devastation healing haaa) because the better the group the less healing is required overall so in M+, for example, a group that doesn’t CC or interrupt requires more healing than one that does so an amazing group with the best healer ever will have a healer doing far less healing than an average group with an average healer.
There is a lot of merit to your argument concerning healing done.
This is why we have civil discussions. You are the first person in the entirety of this thread to make a meaningful counter argument to the proposed method regarding healing.
Now let us theorycraft how then we decide the top10% healer populations. This will be fun.
Since neither healers nor tanks can be carried through high keys, I think we can find a good solution.
The rudimentary form I have at moment is damage dealt by the healer, and avoidable damage taken by the healer and capping healing done in the parse by formula:
Healing Done vs (Total Damage to Group - Avoidable Damage Dealt to Group - Self Healing Done by Group), such that it reads H vs D-A-S, where we seek to maximize H in respect D-A-S.
Then let D-A-S be Z, where Z is all damage dealt to the group that was unavoidable and not reversed via self healing, then we desire healers such that H (total healing done) equals Z, such that we are checking the ratio of H/Z. Let this ratio be T.
Then for determining the top 10% of healers for each class and spec we are comparing T values and damage dealt (call this W). It is not yet clear how one adjudicates which healer is better than the other in respect to T and W simultaneously unless it is also measured against key completion rate, as it is clear that T approaches 1 and W approaches 0 (relative to all damage dealt by the group) as the key level increases to infinity.
EDIT:
Thus it follows that healers should be judged solely by the T value, since increasing W is already accounted for increasing key completion rate.
This solution also avoids the conundrum of tank self healing (blood dk in particular), since T=H/Z and Z already subtracted away all self healing, regardless of which tank is being used for the group parse.
Thus healers whose T value is close to 1 will be the top 10% for the class/spec, we then add a linear weight of K to each parse, where K is the key number. The healer always experiences no net gain or loss for dps/tanks that fail to use their self healing.
So having a T value of 1 at K=30 is 3 times better than having a T value of 1 at K=10.
Better is subjective. Would spec outliers in damage and healing performance be closer than today? Probably. Would the game feel better to play? Not in my opinion.
There’s a high likelihood that the optimal solution that can exist would be for all specs to converge on similar play styles in order to reduce variability. There is no way to balance a burst spec like havoc DH with a ramp spec like feral druid simultaneously for massive AOE pulls, small cleave pulls, and single pulls while their damage profiles are such a chasm apart like they are today. The expectation is that the burst spec jumps ahead quickly and then wanes giving the ramp spec time to catch up. Any variance in fight duration or pull count will give the edge to one or the other, even when balanced against one another perfectly. If you make these level during one kind of pull it will make them fall out of whack during a different kind of pull.
No, I don’t. In fact, I even tried to give you credit for the NN idea and how I think it could help drive better balance in WoW earlier in this thread:
I am certain that from a purely numbers perspective, the logs from spec A, B, through spec Z will all be closer together than they are today. But that would require either significant domain knowledge to try to balance the different damage profiles, tanking profiles, and healing profiles across every permutation of spec that could exist (which you claim would not be necessary) or a severe shift toward homogenization. A single set of numbers cannot leave all specs at the same place when they deal their damage in fundamentally different ways across all combinations, keys, levels, and pull strategies.
My issue isn’t with using GPS to help navigate me to a place in a city I’ve never been before, my issue is with suggesting I hand the keys over fully to the GPS to drive me there itself. AI should be used (if it’s not already) to help sift through volumes of data and pinpoint problem areas. There are many times when suggestions it spits out for balance changes will be great ideas the devs should be able to act quickly upon.
It’s just having the AI perform the changes unchecked every hour will NOT leave the game in a better place than it is today. The changes would necessarily have to drive every spec toward dealing similar styles of damage in order to account for the variability in packs, bosses, group combinations, routes, and everything else that contributes to the experience of a run.
Making changes every hour would be an absolute nightmare for the teams you’re balancing around in the first place. Groups in the top 10% make routes work by coordinating offensive and defensive CDs such that they do a big pull followed by several small pulls. If numbers are adjusting every day (much less hour), these groups would constantly have to be adjusting their strategies because what worked yesterday won’t work today if any of the puzzle pieces that made their pull work changed in some meaningful way. Yes there may be something that compensates for the negative, but it takes trial and error for them to pinpoint what the new meta should be; by the time they’ve done that, it will have changed all over again because of the latest balance pass.
I play a feral druid. It is certainly very frustrating to rarely be without shouting distance of the meta more often than not, and I would desperately love some AI assistance for Blizzard to bring us more in line with the leaders. But M+ is a season-long measuring stick, figuring out how to make things work. I can make peace with the fact that my ceiling is lower on a feral druid this season than it would be if I played Havoc. But being in a position where my ceiling (and floor) are changing hourly while trying to push at my group’s limits would be absolutely infuriating. I know what works and doesn’t on my feral druid giving me realistic targets to shoot for; I don’t want those targets shifting multiple times per day much less week or month.
Perfect numerical balance sounds like a great goal to shoot for. In practice, however, the consequences required to achieve it would be immense. And quite frankly, anyone who thinks the player experience as a whole would be better if the game functioned like that is wrong.
Fair enough. We’ll have to agree to disagree.
I believe equity in key competition rates accounts for the differences in burst vs sustained by the very virtue of having the same completion rate for each key level.
I see the system as allowing for more diversity in class fantasy and design, rather than towards homogenization of playstyles, because regardless of how diverse the playstyles are, they achieve uniform key completition rates.
I understand what a NN is solving for and I understand the difference between completion rate and completion number.
And again. In a sample size of Affliction Locks vs BM hunters, the affliction locks, will more than likely have VERY different completion rate statistics than BM hunters. Even if you are just taking the top 10% of each population.
The people playing Affliction at say a +20 key level will have a smaller population. As it is a more challenging spec, easier to perform poorly. And requires more group coordination, to play around. Now the top 10% of that population will have better completion rates at higher keys.
It is more likely for 100 out of 1000, Aff lock players, to consistently play better and have higher key level completion rates than it is for 10,000 BM hunters out of 100,000. So the BM hunters, even though they are a far better performing spec than aff lock, will comparatively have a lower key level completion rate than aff lock at the top 10% of their class/ spec populations. And in looking at completion rates, 10,000 BM hunters, will be buffed to have the same +20 completion rate as 100 Affliction locks.
So this creates a problem of the weaker class, not getting buffs, or getting worse, simply because the population variance, and concentration of skill.
There is drastically more skill variance in 100,000 MM hunters than there is in 1000 Aff Locks.
So any buffs, nerfs, hour by hour, see saw it.
And still… The game cannot be balanced this way. Because so much of the gameplay is not relative to +/- damage and healing. There are mechanics, CCs, Pulls ect that you cant brute force. And what you are suggesting is that the NN tunes for brute forcing all encounters and dungeons. To ignore CC, Interrupts, Dispells, CDs, Defensives ect. While you may say the NN will account for this. I fail to see how it can, when its tuning knobs are +/- Damage and healing.
So please. If you could. Explain how a NN will account for the non +/- Damage and Healing factors in dungeon completion? How exactly will it account for group A with a class/spec comp that has 1 8sec and 1 30sec interrupt. To a group that has 3 8sec interrupts, 1 10sec interrupt, 1 30 sec interrupt, 3 ST Stuns, 1 AoE Stun? These factors cannot be overcame with +/- Damage/Healing.
He already stated that group comp is irrelevant to look at, because every single person in that top 10% uses the same exact perfectly optimized group comp with 100% of the same talents.
He also already stated that understanding any of that is irrelevant. Because knowing how WoW actually works outside of looking at damage/healing numbers on a spread sheet is irrelevant.
So do not expect much of an answer.
No.
Proper usage of CC, dispels, cooldowns, etc, increases a person and group’s key completion rates.
The algorithm doesn’t seek to make damage and healing one equal, it seeks to make key completion rates equal. This means classes with extreme utility will have mostly negative damage and healing modifiers, while those that have little to no utility will have positive damage and healing modifiers.
Damage dealt is only used to determine the top decile of damage dealers for each class and spec in order to determine the population of that class and spec whose key completion rates will be monitored.
The problem is that something like this isn’t even reliable on its own. Some classes/specs perform quite well in specific situations and might kick back misleading conclusions.
For example, Unholy DK in M+, when specced a specific way and with extremely large pulls, has some of the highest damage in the game with quadratic scaling. It does not do that kind of damage in your typical M+ group. It sounds like your AI might account for Unholy DKs in premades that have parties dedicated to a specific composition and balance accordingly, hurting the majority of Unholy DKs.
Likewise, Arcane is historically a spec that might underperform overall due to difficulty of execution and hostile fight design, yet it’s not uncommon for top Arcane mages to parse drastically above the standard in raid encounters. I don’t know how it’s doing currently, I am just giving an example from past experience. How would AI handle that?
Sure, but it’s highly unlikely that such an equilibrium point for all group comps could possibly be the same. I know you want to consider only the 50,000 foot view, but that’s constructed from bricks every foot to the top.
Your supposition is that a damage number for the low utility spec will be high enough to have the same impact as the lower DPS but higher utility of another spec. But that would require the high DPS spec to deal enough damage to kill the threatening mob before it reaches the cast the higher utility spec would handle with a stop. For that pack this could be balanced, but not every pull is the same. When the next 3 pulls don’t require that stop, the high DPS group gets to keep with that advantage while the higher utility group gains nothing from having the extra utility.
There is simply no way to turn a single knob in the right way for 39 specs when the game mechanics are built around hundreds of factors. You either need to consider identifying and changing those factors as well or removing those factors such that only the knobs you’re turning are the only ones that matter. You can’t fix a leaning house by rotating the top floor windows, you have to address the foundation.
The top tenth percentile contains other types of distortions though.
The OP is right — WoW needs ML balancing tool
Anyone who thinks that an ML model can’t be used to arbitrarily balance all facets of WoW tuning (M+, raid, pvp) simultaneously
Has no clue about modern technology or it’s current capabilities
Or how actually simple this would be for a data scientist in a career working for blizzard (not as a hobby thing but like someone’s job)
I’ve discussed this here on this thread:
And in this thread:
This idea sounds awful.
Your character’s performance could be different every time you log in, without notice.
Based on calculations for content you might not even be doing.
I would vote “no”.
The algorithm doesn’t seek to make damage and healing one equal, it seeks to make key completion rates equal. This means classes with extreme utility will have mostly negative damage and healing modifiers, while those that have little to no utility will have positive damage and healing modifiers.
I never said make damage and healing equal. The algorithm would not balance things that drastically increase completion rates. Eventually keys scale to a point where missing an interrupt or a stun is a death. No amount of damage or healing can recover from it.
There are aspects of the game which +/- damage and healing, will not balance. The algorithm cannot account for this.
And the way you are defending it, is very indicative of someone who would needs to be right, than would rather be correct.
At this point I am disengaging from the debate after reading your responses. You have not clearly articulated how this would account for the many non +/- Damage and Healing factors that contribute to success in M+.
And additionally we have not broached how this would affect PvP and Raid. Because plain and simple, it wouldnt and cant.
Thanks for the conversation on the topic. But you have not answered any of my questions, or several other people asking similar questions.
So until you can explain
There is drastically more skill variance in 100,000 MM hunters than there is in 1000 Aff Locks.
So any buffs, nerfs, hour by hour, see saw it.And still… The game cannot be balanced this way. Because so much of the gameplay is not relative to +/- damage and healing. There are mechanics, CCs, Pulls ect that you cant brute force. And what you are suggesting is that the NN tunes for brute forcing all encounters and dungeons. To ignore CC, Interrupts, Dispells, CDs, Defensives ect. While you may say the NN will account for this. I fail to see how it can, when its tuning knobs are +/- Damage and healing.
So please. If you could. Explain how a NN will account for the non +/- Damage and Healing factors in dungeon completion? How exactly will it account for group A with a class/spec comp that has 1 8sec and 1 30sec interrupt. To a group that has 3 8sec interrupts, 1 10sec interrupt, 1 30 sec interrupt, 3 ST Stuns, 1 AoE Stun? These factors cannot be overcame with +/- Damage/Healing.
Im out.
“Your supposition is that a damage number for the low utility spec will be high enough to have the same impact as the lower DPS but higher utility of another spec. But that would require the high DPS spec to deal enough damage to kill the threatening mob before it reaches the cast the higher utility spec would handle with a stop. For that pack this could be balanced, but not every pull is the same. When the next 3 pulls don’t require that stop, the high DPS group gets to keep with that advantage while the higher utility group gains nothing from having the extra utility.”
Which is exactly how it should be so that they have equal completion rates for the same key level.
If we’re seeing an obscene +500% damage modifier on a zero utility spec then it means the NN is doing its job, and that human developers need to intervene to figure out why such a high modifier was needed for this spec and fix it accordingly (which has been said many times already in this thread).
As for the other person mentioning “missing and interrupt…” blah blah, that’s a skill issue, not a balance issue. Players that often fail to interrupt won’t be in the top 10%. The Neural Network proposal doesn’t fix “Git gud issues,” nor should it, nor was it ever supposed to.
Which is exactly how it should be so that they have equal completion rates for the same key level.
But the point is they wouldn’t. Again, you’re wanting to ignore the building blocks that lead to whether a key is completed and only focusing on the end result. But that’s quite literally an important task because different combinations of building blocks will never have the same completion rates if you only focus on the end product.
Just considering two different DPS profiles, output (O) and utility (U). The output spec is banking on having enough damage to kill threats before the group dies while the utility spec uses their tricks to reduce the threat until it dies. If all we cared about is O versus U, this could probably be controlled by a single tuning knob. But there are 8 unique combinations of these two specs that all need to have the same completion rate. If OUU is able to kill mobs fast enough, then what’s that going to do for OOO? And UUU will need to be able keep control over the mobs long enough to handle the lower DPS, even with diminishing returns. This task is basically impossible with only one tuning knob, and we only considered two DPS specs. There are 24 more specs to throw it before we get to tanks and healers.
If we’re seeing an obscene +500% damage modifier on a zero utility spec then it means the NN is doing its job, and that human developers need to intervene to figure out why such a high modifier was needed for this spec and fix it accordingly (which has been said many times already in this thread).
And how long do you think that’s going to take? Do you not understand how abysmal the player experience will be to have hourly tuning passes for even the average player? Not to mention that the groups on the bleeding edge, AKA the top 10%, rely on slim margins to make the pulls they do work. They will have a run at 2:45 that works great for a tough pull only to have that pull fall apart completely in the run at 3:30 because one of their DPS lost 5% of their damage which extended the pull by an extra 10 seconds leading to the fatal overlap they skipped entirely 45 minutes prior.
I’m sorry if you can’t see just how awful that would be to play. I’m pretty certain it would kill the game.
As for the other person mentioning “missing and interrupt…” blah blah, that’s a skill issue, not a balance issue. Players that often fail to interrupt won’t be in the top 10%. The Neural Network proposal doesn’t fix “Git gud issues,” nor should it, nor was it ever supposed to.
You seriously think the top 10% of players are absolutely perfect? Also missing an interrupt (or any other mechanic for that matter) can be caused by environmental factors outside the player’s control; the top 10% of players will be most likely to improvise to make it work, but sometimes having a melee get randomly targeted with a mechanic that requires them to run out can leave them out of range to kick, for example. The top key pushers have more failed keys than not when they reach the pinnacle of character power; sometimes that’s down to bad luck on certain overlaps but oftentimes it’s them executing a mechanic poorly.