Most Read Latest News Blog Resources

Zeichick's Take: When the cloud was good, it was very very good. But when it was bad, it was horrid.




July 24, 2008 — 
Cloud computing took a big hit this week amid two significant service outages. The biggest one, at least as it affects enterprise computing, is the eight-hour failure of Amazon’s Simple Storage Service. Check out Amazon Web Services service health dashboard, then select Amazon S3 in the United States for July 20. You’ll see that problems began at 9:05 a.m. Pacific Time with “elevated error rates,” and that service wasn’t reported as being fully restored until 5 p.m.. About the error, Amazon said:

We wanted to share a brief note about what we observed during yesterday's event and where we are at this stage. As a distributed system, the different components of Amazon S3 need to be aware of the state of each other. For example, this awareness makes it possible for the system to decide to which redundant physical storage server to route a request. In order to share this state information across the system, we use a gossip protocol. Yesterday, we experienced a problem related to gossiping our internal state information, leaving the system components unable to interact properly and causing customers' requests to Amazon S3 to fail. After exploring several alternatives, we determined that we had to temporarily take the service offline so that we could clear all gossipped state and restart gossip to rebuild the state.

These are sophisticated systems and it generally takes a while to get to root cause in such a situation. We're working very hard to do this and will be providing more information here when we've fully investigated the incident. We also wanted to let you know that for this particular event, we'll be waiving our standard SLA process and applying the appropriate service credit to all affected customers for the July billing period. Customers will not need to send us an e-mail to request their credits, as these will be automatically applied. This transaction will be reflected in our customers' August billing statements.


Kudos for Amazon for issuing a billing adjustment. However, as we all know, the business cost of a service failure vastly exceed the cost you pay for the service. If your applications were offline for eight hours because Amazon S3 was malfunctioning, that really hurts. This wasn’t their first service failure: Amazon S3 went down in February as well.

Less significant to enterprises, but just as annoying to those concerned, involved hosted e-mail accounts hosted on Apple’s MobileMe service. MobileMe is the new name of the .Mac service, and the service was updated in mid-July along with the launch of the iPhone 3G. Unfortunately, not everything worked right, and e-mail’s been problematic for days. And as you can see from Apple’s dashboard, a fair number of subscribers can’t access their e-mail. Currently, this affects about 1% of their subscribers—but it's been like that since last Friday.

According to Apple, “We understand this is a serious issue and apologize for this service interruption. We are working hard to restore your service.”

This reminds me of the poem from that great Maine writer, Henry Wadsworth Longfellow:

There was a little girl
Who had a little curl
Right in the middle of her forehead;
And when she was good
She was very, very good,
But when she was bad she was horrid.

Alan Zeichick is editorial director of SD Times. Read his blog at ztrek.blogspot.com.



Share this link: http://www.sdtimes.com/link/32604
 

Add comment


Name*
Email*  
Country     


  • Comment
  • Preview
Loading



 
 
 
 
News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 
Download Current Issue
ISSUE 3/15/2010 PDF

Need Back Issues?
DOWNLOAD HERE

Receive the print Edition?


 
blogs tab
Google Code turns 5
Google Code Turns 5, and adds a Paxos Algorithm to make the system more stable and reliable.
03/17/2010 11:16 AM EST

Test your Visual Studio 2010 know-how
Microsoft is offering free beta certification exams for Visual Studio 2010.
03/17/2010 11:08 AM EST

Microsoft lifts the hood on IE9
Microsoft is previewing IE9.
03/16/2010 01:10 PM EST

 

Events calendar tab
3/22/2010 to 3/25/2010
Santa Clara, Calif.
The Eclipse Foundation

4/12/2010 to 4/14/2010
Las Vegas
Penton Media

4/12/2010 to 4/15/2010
Santa Clara, Calif.
O'Reilly Media

4/19/2010
New York City
Flagg Management

4/25/2010 to 4/28/2010
Overland Park, Kans.
IIUG