TLDR: The following explains in technical terms how and why Blizzard has resorted to layering instead of fixing the actual problem. If you’re interested, read on. And believe it or not - this IS the short version.
Blizzard? Are you listening?
Here’s the actual reason there are queues and layering: Lack of foresight and planning on the part of both management and engineering.
Now, I’m not privy to the details of how Blizzard actually has these servers set up, as that’s all NDA type stuff and no-one outside the company will (or should) have that. However, as a 30+ year veteran in the networking and IT field, I can say this without hesitation.
It doesn’t matter either way if the servers are physical or virtual. It starts with capacity. Not player caps. I’m talking CPU processing, memory and the speed of storage solutions for each core unit that represents what the players see as a “server”.
Layering was originally designed less as a tool for correcting and adjusting server capacity as it was more of a convenience for the client (or player, if you will). Since at the dawn of WOW, hardware accelerated 3D rendering was new, at least on the consumer level. Having over 300 player and NPC models together with the terrain and object meshes was brutal on old 3D cards (or worse… SOFTWARE rendering!). Layering fixed this by reducing the number of players on screen in densely populated areas and thus reduced the poly-count for client hardware to render.
Let’s say that we have a “server” that can support a maximum of 1000 players, for example. The common practice in this case is to over-subscribe that server and give it a theoretical maximum of 1500, hopefully having done enough research to know that even at high capacity times - only about 950 of those 1500 players are going to be connected and playing.
This isn’t the problem either. Or, at least, it shouldn’t be if there was enough foresight to have the ability to virtually shift some of this hardware capacity where it’s needed. Like I said, it’s common practice and if it’s done right, no-one ever notices. This is especially true if the hardware that the servers are being hosted on is virtual, EG: AWS or Azure which should be able to dynamically expand it’s capacity based on demand.
This problem becomes contractual and is limited by budget - or - how much is Blizzard willing to pay these virtual services for the amount of capacity they need, when they need it. The questions then become:
1: Has Blizzard reached their contracted capacities and if so, why are they not negotiating for higher capacities?
2: Has Blizzard reached the virtual capacity limit set by the hosts without the option of additional capacity? If so, then this is a serious oversight by the design team. Either player caps need to be reduced per server instance, OR special contract negotiations need to take place with the hosts in order to find more capacity - if available.
But what if the hardware isn’t virtual?
Typical set-ups for this use-case indicate that each “server” isn’t a single server, but a collection of satellite servers coordinated by a central director, or “root” server.
In this case, additional hardware may not be needed - IF - the satellite servers are configured in a dynamic fashion where their capacities can be freed up from one server instance and transferred to another with minimal effort.
Example:
Each server instance has 1 Root server and 6 (blades or shards or leaves or whatever vernacular your company ascribes to). Each has a capacity for 1500 players (over-subscribed).
Server A is a PVE server with 992 players active. Near technical capacity.
Server B is a PVP server with 425 players active. 43% technical capacity.
The Fix: Gracefully halt shard 6 of server B, transferring any active players to shards 1-5. Reconfigure shard 6 to be shard 7 of the server A cluster. No additional hardware needed and can be easily reversed. A good set of engineers would have foreseen this scenario and planned accordingly.
It’s not about whether or not each server instance was designed with enough capacity to begin with. It’s about whether or not each server instance was designed to be able to have ADDITIONAL capacity if the need arose.
I can’t even get in to the server software side of things without running on for nine more paragraphs. Building in the fluid dynamics of adjustable capacities and memory management starts on the whiteboard before even the first character on the keyboard is struck. If this is botched or forgotten, a complete re-write of the server engine is often the answer and in that case, everyone loses until the code monkeys hash it all out. (That’s code humor).
Always overestimate your repair estimates to both management and your customers and you can walk away with your reputation as a miracle worker. Anyone who recognizes this will be familiar with the philosophy of one Montgomery Scott of the USS Enterprise. Corny, but no less effective as an engineer’s mantra.
And to the executives who are worried about their bottom line? To you, I say this: If you are going to continue to beat the living daylights out of this dead horse, you should at least invest enough time and money to get it to look like it’s kicking every now and again. it’s not hard. It’s probably not all that expensive provided you have good, experienced people who know what they’re doing.
I’m sure there’s more to it than this. There always is. My point being, very little is impossible with today’s technology. You don’t even have to be particularly innovative or revolutionary. Most of that is already done in one form or fashion. You just have to be willing to make the investment, whether that be time or money or both. Employ the right people with the right experience. More often than not, clients cry out for what they know will fix it, when in fact the problem is unrelated. Clearly the problem we face is design. Blizz engineers should feel confident and be backed by management when they tell players to get stuffed while they work on an actual fix and not just implement a series of bandaids while the whole project continues to hemorrhage it’s life blood all over the floor .
So, go ahead Blizzard. Gimme a call. II’ll fix your stuff for ya.