Rackspace’s Fanatical Support is Not Immune to Outage Notification Delays

We at Systems Watch believe Rackspace’s Cloud offering is a very good service and comes with what is often called Fanatical Support, however even they are not immune to classic outage communication delays or time frame ambiguity when issues arise.

Early this morning at approximately 3:47 AM EDT 06/20/2013 Systems Watch detected an issue with Rackspace Cloud Servers in the Chicago ORD region. This outage lasted until approximately 6:48 AM EDT 06/20/2013 according to our beacons in the region. This is about a 3 hour outage window. Upon further investigation Rackspace was having a network maintenance scheduled during the time frame 12:01 AM – 5:00 AM CDT that would have a maximum 10 minutes of outage if any in the ORD region:

This is the final outage incident recap from Rackspace:

This is what Systems Watch Realtime Graph showed:

This is what Systems Watch Twitter Notification showed:

The Issue At Hand

The issue here is not necessarily about the outage, they happen and most people in operations understand this. In this case a scheduled network maintenance went bad and service disruption occurred longer then the stated time. We have all experienced this at one time or another. The greater issue that lots of cloud service providers have is reporting exact time frames of the event. In this situation Systems Watch is showing a 1 hour discrepancy between Rackspace stating when the outage occurred and when Systems Watch did.

We do not believe this to be on purpose or a deliberate a way of shaving off 1 hour in outage time, however we do believe this is a problem most organizations run into. When an outage occurs immediate validation and investigation begins, utilizing tools both automated and manual, human and machine to validate and isolate the cause. During this time most organizations are weary of reporting anything until they absolutely understand the issue. At times this can account for minutes to hours before a public notification goes out. By the time the person who communicates the outage to the public receives absolute evidence an outage occurs, people on twitter probably have already been reporting it.

It is our feeling that reporting an outage ahead of having complete solid evidence, and correcting the statement if needed is the better course of action. The larger the organization the harder this can be to implement, whether due to escalation chain or PR, but we find clients appreciate knowing an issues is happening out of their control rather then second guessing their own systems.

Rackspace’s support is very good in comparison to other cloud providers by far, but there is always room for improvement, even with a great service like Rackspace.

Latest Images

Trending Articles

Latest Images