Questioning 99.999% uptime: When are four nines enough?High availability is a fact of IT life, but for many organizations, 99.999% uptime isn't necessary. Article by Erin Watkins.Whether an organization needs 99.999% uptime depends on a number of factors, which range from software limitations to cost -- financial or otherwise.
"Five nines is what you need if human life or millions of dollars per minute are on the line," said Alan Robertson, Linux developer and founder of the High-Availability Linux project.
Weighing high availability's costs
The costs to achieve high availability, whether it's four nines of uptime or five, include software, hardware, manpower and training. Organizations need to weigh those costs against the losses incurred from unplanned downtime and the ability to schedule planned downtime. Even companies in some industries where millions of dollars are on the line, such as the stock market, can easily schedule downtime -- as long as it's not during trading hours, Robertson said.
The biggest losses from unplanned downtime come from business disruption and lost revenue, according to a recent survey of 41 datacenters by Emerson Network Power, a datacenter facilities vendor. An organization's response to unplanned downtime -- detecting the problem, fixing it and getting systems back up and running -- also costs money, the survey said.
In a shop with 99.99% uptime, a company can expect 8.76 hours of downtime per year. If each hour of downtime costs $1 million, the total loss is $8.76 million. But if that shop had 99.999% uptime, there would be less than an hour of downtime per year, with a total loss of less than $1 million. In that case, the price of increasing uptime may be worth it, said Sander van Vugt, an independent trainer and consultant based in the Netherlands.
"For any product there is a cost/value tradeoff," said Wayne Gateman, an area coordinator of virtualization for a Fortune 15 company in the medical distribution and software field. "What will the downtime do to you? How much can you suffer in a downtime? What are the risks of going down?"
Four nines should work for most high-volume shops like online retailers or Web hosts, and for other, offline businesses, even three nines should do, van Vugt said.
Getting to 99.999% uptime
But in some industries, such as transportation, high availability is critical no matter the cost. In the Netherlands, for example, all trains stopped because of a computer failure at a hub location, and hundreds of thousands of people got stranded.
"99.999% is for those industries," van Vugt said.
In those cases, organizations may turn to fault-tolerant servers. Multiple redundancies -- the server hardware itself, the failover software and software that splits the physical box to force failovers for upgrades -- make fault-tolerant servers worth the price, if you need it, Gateman said.
Before the switch to fault-tolerant Stratus servers, Gateman's company used software-based failover to keep the production environment running, but it didn't always work as planned.
"Software is software, and sometimes it doesn't always pick up on a failure, whereas the hardware is definitely going to report a failure," Gateman said. "And by having built-in redundancy, that failure generally won't take the virtual center down."
His shop added a second server just to be sure.