Digging into Windows Azure

Patrick Hynds
March 29, 2012 —  (Page 4 of 4)

Reliability matters
A critical assumption about Windows Azure has been that hosting solutions on such a massive system, with a company with so many properties that support such great uptime records, is going to translate directly to uptime for your solution. But it takes time for this assumption to be proven. Reliability is like security: It is all about risk and probability. The more resources and effort, the higher the expected uptime should be, but the devil is in the details, and the costs go up very fast as you near smaller and smaller gains in reliability.

During the Dot-Com boom, the discussion was about the “number of nines” you could achieve. My experience with server-based solutions in general is that 99% uptime is easy to get with redundant power and ping (a second path to the Internet), and a little extra spent on the server specs, such as error-correcting memory and RAID options on the disk. That sounds good until you do the math and realize that 1% downtime is more than three full days a year of the solution being offline. Ten times less downtime is achieved at 99.9% reliability and represents the expectation of a little over 8 hours of downtime a year.

There are levels beyond this of course all the way to the coveted “5 nines,” which represents 99.999%, just minutes of downtime per year, but most solutions do quite well at 99.9%. The reason the world does not just do what it takes to deliver 99.999% or even 100% reliability is that the former is ridiculously expensive and tricky to accomplish, and the second is just flat-out not possible in the real world.

Microsoft and all the other cloud providers are loathe to state overtly in their Service Level Agreements (SLAs) what levels exactly are supported on their services. I have yet to see an actual number presented on any of them in terms of time.

Microsoft has an entry in its support FAQ that says, “How will the Windows Azure, SQL Azure, Caching, Service Bus and Access Control SLA agreements work with current on-premise Microsoft licensing agreements?” The answer to that item is, “Windows Azure, SQL Azure, Caching, Service Bus and Access Control are independent of our on-premises Microsoft licensing agreements. Our SLAs for Windows Azure provide you a monthly uptime guarantee for those services you consume in the cloud, with SLA credits against what we have billed you in the event we fail to meet the guarantee.”

My understanding of this is if there is an outage, the remuneration is in credit for the hours where the system was down.

It is not surprising that, while pioneering, the providers do not want to name a number and be held to that number. After all, they will be judged by their actual performance in the end. Unfortunately, there has been plenty to judge with Amazon’s EC2 and Microsoft Windows Azure both having outages in the last year or so that were assumed to be designed out of the system. Most recently, Azure had a problem based on February’s leap day. To its credit, Microsoft has been very transparent about what happened, including update and a post to the Windows Azure blog explaining what happened. You can read the summary online, but the gist of the issue is that a bad calculation, combined with an update that was going on at the time, accelerated the issue and the outage.

As stated before, there are no perfect systems and no such thing as 100% uptime. The question is how the provider copes when things go wrong. I have often heard it said that any fool can manage when things are going well, but it takes a pro to manage when things have gone off the rails. If there are many and repeated or extended issues, then confidence will be lost no matter what the stories are behind those outages.

I still have plans to use Azure in several upcoming projects, and nothing has changed in that regard, except I will be sure to double-check what my fallback plans will need to be if there is an outage of the platform. This is a best practice no matter what you use to run your solution if even small stretches of downtime cannot be tolerated. To be fair, I see the outages Amazon had a while back in the same light.

Moving forward
Windows Azure is a tool that can help you solve problems in much the same way that the .NET Framework on Windows is a tool. We can expect it to change a bit over time as Microsoft seeks to fill in the holes that remain and tries to maintain the balance in its pricing structure to keep the whole system profitable while not scaring away the clientele.

The cloud is not guaranteed to be part of your IT future in the next few years, but it is a good bet that it will be one way or another, even if it is only via the solutions your vendors provide. The options can be confusing, so it is best to tackle understanding them one variable at a time. Most organizations that I have polled that are using Azure are happy with what it provides and are planning to expand their use going forward. The outage recently is a bump in the road to be sure; however, the savings and other benefits to be achieved are too great to not keep trying.

Editor's note: Due to customer feedback, Microsoft has simplified its branding so that the AppFabric name is now part of the broader Windows Azure brand. Features and services remain unchanged from the Service Bus technology. More information is available here.

Related Search Term(s): Azure, cloud, PaaS

Pages 1 2 3 4 

Share this link:

Life in the PaaSLane: Making your apps ready for the cloud
Migrating apps out of data centers and into the cloud surely won’t make you lose your mind Read More...

News on Monday  more>>
Android Developer News  more>>
SharePoint Tech Report  more>>
Big Data TechReport  more>>



Download Current Issue

Need Back Issues?

Want to subscribe?