Since we have tons of experts on the subject of downtime-less patching, I am interested in hearing your plans for converting a 20+ year old system that lives on bare metal to be seamless in patching.
Please provide your architecture diagrams in visio format. Please provide standard proposal formats with backout plans, go/no-go criteria, etc.
Cata went live the week after patch day without a reset.
Mists went live the week after a patch day without a reset.
IMO, the way you do a mid-xpac patch without this much downtime is to break it up in two sections. One that patches the week before and the next one that patches the week after. This should result in not having to lose AN ENTIRE DAY in patching and “maintenance.” I am going to assume part of these windows is that something didnt work correctly and they are trying to fix it. But, here’s hoping …
While I agree with the spirt of us old-timers who have been around through many hundreds of these…it does stand to reason that as we move into the future, it really shouldn’t be taking the entire day to do this.
Yes the game as a concept is 20+ years old but the systems in place to make it run have all been replaced little by little over time. It’s time for all sub-based games to start making “patch day” a thing of the past or have it down to “patch-hour”
It’s really going to depend on the resources that are involved. We know there are login servers, I’m assuming RDBMS for auth and/or data at the least, but what are the applications and containers (if any)? How are the data RDBMS’ set up?
We’d need a lot more information to go on to determine that, including the interactions between the systems.
Of course it’s going to be a difficult thing – but it’s something a lot of companies do, and no, it’s not something we’re going to solve on the forums without an NDA.
DB migrations are pretty much a thing of the past. ORM and atomic transaction models were all the hype during Rails and haven’t completely left the building, yet, but places like Google, Facebook, even X or Instagram, use structureless databases that can be shifted at runtime and, and this is the cool thing, reverted in one atomic write.
Consider the amount of data shuffled by, say, YouTube. In a sense, it’s not different than WoW, with a networked server and a client communicating with each other and keeping some sort of state. Now, sure, WoW has to manage many more concurrent data updates rather than streams, but by breaking them out, you could fathomably create the shadow database on a replication layer and then, as the hour of maintenance arrives, simply switch over. Same DB, if something goes wrong you simply switch back. Since state B, the new state, has already been maintained in the back, only minutes to half an hour will be needed.
scoffs You are completely disregarding the Pfetzer valve. Obvious rookie mistake.
I’m gonna guess WoW doesn’t use NoSQL, but DB migrations w/ Hibernate, etc. are considerably easier. Hell, even PHP has them on a per-commit basis.
However, I’d bet that a lot of these changes aren’t backwards compatible, so they need to be done when the changes happen. To minimize risk they probably do any others at the same time – things that wouldn’t necessarily have an effect like adding columns or tables, or whatever data isn’t being accessed by the application.
The problem about replication is how do you handle updates in that period. So maintenance starts, you start replicating the DB A at time X, and then after some time you have a copy in DB B at time Y. However, you’ve got to account for any changes to A between times X and Y. A replay log would make that fairly trivial (again, depending on backwards compatibility), but I’m going to bet they’re not doing anything append-only like Cassandra. Probably dealing with MySQL/Maria or Oracle… I haven’t kept up with those in a bit though in terms of newer functionality for larger datasets.
I do. With the “great” Cata update announcement was also an announcement that the underlying infrastructure would be streamlined. If you recall, we had dozens of issues in Cata prepatch that seemed to be related to DB issues. Same before SL.
Blizzard would be idiots, and I tend to think they aren’t, not to move to more performant solutions in front- and backend.