Currently working on preparing for an AWS certification exam. I see that AWS S3 storage containers (Simple Storage Service) are designed for durability of 99.999999999% (that’s eleven 9’s), and 99.99% availability over the year. Makes me curious how many 9’s are in Blizzard’s realm durability.
Apple and oranges…
Durability against what, a wrecking ball? Being kicked? Falling off a shelf?
My cast iron safe is designed for 99.99999% durability but my Toyota Prius is not? WTF?
Absolutely obnoxious.
You know how many 9’s are in China’s blizzard realm durability? 0
a storage container is not the same as a complex software program that’s constantly being recoded.
your prius is actually designed to wear out with use and crumple on impact. Doesn’t mean it’s a good design
Inclusive or exclusive of maintenance?
Exclusive of maintenance, I’m going to guess that they’ve got around 99% availability, which is about 1.7 hours of down time per week on average.
Inclusive of maintenance it’s closer to 95%.
Cloud-based storage and scheduled app maintenance are two completely different animals. Talk to any enterprise-level developer about their expectations re: availability of cloud storage in a testing / production environment and actual application uptimes / maintenance windows. They’ll expect to get to their data and code almost all the time, and they’ll expect apps (particularly internal apps) to go down for scheduled maintenance on a regular basis. Unless they’re doing in-tandem “blue-green” upgrades, which requires a lot of virtual computing power, there are just going to be times when apps aren’t available. And in those situations, clients do what they have to do: try again later.
???
Entirely different between the MTBR on a storage device, versus a complex stack where you run into combinatorics of increasing chance of failure.
Doesn’t matter if you’re S3 on AWS, GCP, Azure or MinIO, all you’re doing is facilitating the throughput of storage. Now can you get to that storage array from your region? How is the performance going to be?
This is where something like CAP comes into play. Six-nine availability with 1mbps throughput isn’t going to cut it whatsoever (P) even if you cover the other two components.
So now you shard (A), you increase complexity and then run into a consistency (C) problem. How do you then maintain the data in a multi-region environment?
Yup, your current colo has six-nines but there is no consistency of data (or data duplication is now happening with everyone duping gold or items via an AH hack).
That is the crux of the problem, maintaining DB consistency in a multi-region environment without discounting performance and ensuring availability.
Big bucks if you know your stuff - otherwise just focus on DevOps.