Zeichick's Take: When the cloud was good, it was very very good. But when it was bad, it was horrid.



Email    print   
July 24, 2008 —  (Page 1 of 2)
Cloud computing took a big hit this week amid two significant service outages. The biggest one, at least as it affects enterprise computing, is the eight-hour failure of Amazon’s Simple Storage Service. Check out Amazon Web Services service health dashboard, then select Amazon S3 in the United States for July 20. You’ll see that problems began at 9:05 a.m. Pacific Time with “elevated error rates,” and that service wasn’t reported as being fully restored until 5 p.m.. About the error, Amazon said:

We wanted to share a brief note about what we observed during yesterday's event and where we are at this stage. As a distributed system, the different components of Amazon S3 need to be aware of the state of each other. For example, this awareness makes it possible for the system to decide to which redundant physical storage server to route a request. In order to share this state information across the system, we use a gossip protocol. Yesterday, we experienced a problem related to gossiping our internal state information, leaving the system components unable to interact properly and causing customers' requests to Amazon S3 to fail. After exploring several alternatives, we determined that we had to temporarily take the service offline so that we could clear all gossipped state and restart gossip to rebuild the state.

These are sophisticated systems and it generally takes a while to get to root cause in such a situation. We're working very hard to do this and will be providing more information here when we've fully investigated the incident. We also wanted to let you know that for this particular event, we'll be waiving our standard SLA process and applying the appropriate service credit to all affected customers for the July billing period. Customers will not need to send us an e-mail to request their credits, as these will be automatically applied. This transaction will be reflected in our customers' August billing statements.


Kudos for Amazon for issuing a billing adjustment. However, as we all know, the business cost of a service failure vastly exceed the cost you pay for the service. If your applications were offline for eight hours because Amazon S3 was malfunctioning, that really hurts. This wasn’t their first service failure: Amazon S3 went down in February as well.




Pages 1 2 


Share this link: http://sdt.bz/32604
 
Most Read Latest News Blog Resources

Add comment


Name*
Email*  
Country     


  • Comment
Loading




close
NEXT ARTICLE
Open Cloud Initiative envisions an open cloud future
Vendor lock-in is a hidden danger, so principles must be established to protect users Read More...
 
 
 
 
News on Monday
more>>
SharePoint Tech Report
more>>


   

 
 

Download Current Issue
FEBRUARY 2012 PDF ISSUE

Need Back Issues?
DOWNLOAD HERE

Want to subscribe?


 
blogs tab
Are you at risk for burnout?
Burnout is a severe problem and it can strike at any time. Here's how to tell if you are nearing the edge.
02/09/2012 02:16 PM EST

Agility, mom, and apple pie
If we're to evaluate the state-of-the-art in software development, we should start with the values espoused in the Agile Manifesto.
02/07/2012 11:57 AM EST

RIM woos developers with free tablet
How do you get more apps ported to the BlackBerry PlayBook? By giving every developer a free tablet, of course!
02/04/2012 01:57 PM EST

GitHire: Use Headhunters to Find Your Perfect Programmer
Are you a hiring manager tired of scouring the job boards? Check out this new service that will find 5 people interested in your jobs.
02/03/2012 12:17 PM EST

Facebook claims hacker cred
Facebook's SEC S-1 filing form includes a short essay on the Hacker Way by Mark Zuckerberg himself.
02/02/2012 08:26 AM EST

Ryan Dahl steps down
Ryan Dahl, creator of Node.js, steps back from his position as gatekeeper for the project.
02/01/2012 04:58 PM EST

 
Events calendar tab
2/13/2012 to 2/16/2012
Santa Clara
TechWeb

2/26/2012 to 2/29/2012
San Francisco
BZ Media

2/27/2012 to 3/2/2012
San Francisco
RSA

3/4/2012 to 3/7/2012
Las Vegas
IBM Tivoli

3/5/2012 to 3/9/2012
San Francisco
TechWeb