Zeichick's Take: When the cloud was good, it was very very good. But when it was bad, it was horrid.
Stories Columns Opinions Resources
Preflight builds spread wings for smoother projects
Developers are increasingly turning to preflight builds, allowing them to experiment with ...
|
Coverity creates program to enforce code adherence
The Architecture Analyzer uses mapping technology from the company's Software DNA static a...
|
QCon 2008 features domain-driven development
This year's QCon invites speakers like Eric Evans and Dan North to talk about domain-drive...
|
.NET similarities prove golden for Silverlight
Microsoft has focused on making Silverlight 2 symmetric with the .NET platform, and that h...
|
SOA Watch: New economic realities
In the current economic downturn, agile programming and SOA are attractive options that bu...
|
Integration Watch: A new twist on threads
The key to raising the efficiency of multiprocessors is to shrink the overall workload by ...
|
Integration Watch: The Return of NetRexx?
Java scripting languages are seeing a surge in popularity, with NetRexx looking particular...
|
Windows & .NET Watch: Transaction crowd gets a boost
With multicore chips becoming the standard for processors, the need for a flexible, usable...
|
From the Editors: Election should shake up JCP
Rod Johnson has the right ideas for opening up the Java Community Process, and he may be a...
|
Letters to the Editor: Sun gives REST, SOAP choice
A reader takes issue with a headline on our story about Sun working with REST along with S...
|
Guest View: Be smart and lazy
The optimal solution for problems is the simplest one, so always aim to streamline your ap...
|
Zeichick's Take: From EXEC to EXEC 2 to REXX to NetRexx
Andrew Binstock's column last week, "The Return of NetRexx," brought back some fond memori...
|
Advanced Corda CenterView™ Data Visualization for the BusinessObjects™ Intelligence Platform
Corda Technologies presents a white paper on pervasive BI. The BusinessObjects business in...
|
From Mobile to SOA: A Guide for Optimized Application Deployment
Customer need has driven the emergence of multiple computing tiers. Today’s application d...
|
e-Kit: Web Application Security
Is your network secure? What about your web applications.
If IT security is your top p...
|
Practical tips for saving money on code maintenance
If software design is expensive, well, code maintenance is even more so. When you look...
|
By Alan Zeichick
July 24, 2008 —
Cloud computing took a big hit this week amid two significant service outages. The biggest one, at least as it affects enterprise computing, is the eight-hour failure of Amazon’s Simple Storage Service. Check out Amazon Web Services service health dashboard, then select Amazon S3 in the United States for July 20. You’ll see that problems began at 9:05 a.m. Pacific Time with “elevated error rates,” and that service wasn’t reported as being fully restored until 5 p.m.. About the error, Amazon said:
We wanted to share a brief note about what we observed during yesterday's event and where we are at this stage. As a distributed system, the different components of Amazon S3 need to be aware of the state of each other. For example, this awareness makes it possible for the system to decide to which redundant physical storage server to route a request. In order to share this state information across the system, we use a gossip protocol. Yesterday, we experienced a problem related to gossiping our internal state information, leaving the system components unable to interact properly and causing customers' requests to Amazon S3 to fail. After exploring several alternatives, we determined that we had to temporarily take the service offline so that we could clear all gossipped state and restart gossip to rebuild the state.
These are sophisticated systems and it generally takes a while to get to root cause in such a situation. We're working very hard to do this and will be providing more information here when we've fully investigated the incident. We also wanted to let you know that for this particular event, we'll be waiving our standard SLA process and applying the appropriate service credit to all affected customers for the July billing period. Customers will not need to send us an e-mail to request their credits, as these will be automatically applied. This transaction will be reflected in our customers' August billing statements.
Kudos for Amazon for issuing a billing adjustment. However, as we all know, the business cost of a service failure vastly exceed the cost you pay for the service. If your applications were offline for eight hours because Amazon S3 was malfunctioning, that really hurts. This wasn’t their first service failure: Amazon S3 went down in February as well.
Less significant to enterprises, but just as annoying to those concerned, involved hosted e-mail accounts hosted on Apple’s MobileMe service. MobileMe is the new name of the .Mac service, and the service was updated in mid-July along with the launch of the iPhone 3G. Unfortunately, not everything worked right, and e-mail’s been problematic for days. And as you can see from Apple’s dashboard, a fair number of subscribers can’t access their e-mail. Currently, this affects about 1% of their subscribers—but it's been like that since last Friday.
According to Apple, “We understand this is a serious issue and apologize for this service interruption. We are working hard to restore your service.”
This reminds me of the poem from that great Maine writer, Henry Wadsworth Longfellow:
There was a little girl
Who had a little curl
Right in the middle of her forehead;
And when she was good
She was very, very good,
But when she was bad she was horrid.
Alan Zeichick is editorial director of SD Times. Read his blog at ztrek.blogspot.com.
Share this link: http://www.sdtimes.com/link/32604