You may already know that Gordian Project users are in the cloud. Well, on Tuesday, we hit our second bump in the road with Google Apps. An outage. You may say to yourself, "An outage? With Google Apps? Really?" Well. Yes. Really. Totally freakin' down. Apparently, Google had an issue Tuesday morning that brought down the email interface for apps users. Déjà vu?
Here is the error I got in Chrome:
At first I thought to myself, "Hmmm. That's weird." So I literally waited 30 seconds and tried again. Same thing. So I asked the person next to me to try. Same thing. So, I tried my iPhone and got:
OK. Seems likely to be a global problem. So, I alerted users on our network that I was aware of an issue with Google Apps and was looking into it. Because the error says, "Please try again in 30 seconds.” I figured it would be a temporary outage and waited only a few minutes. The problem persisted. So I checked Google News and, sure enough, there's a widely recognized outage. From the news, I noticed two things that were particularly interesting:
- I wonder if the "tip-toeing" of wave into apps created yesterday's havoc.
- Google has an Apps Status Dashboard
So, after I found out that there was an Apps Status Dashboard, I checked it out and here's what I got:
Google, why didn't you show this to me on the 502 error page? Instead, you told me to try every 30 seconds. I can't imagine how many people wasted hours of their day refreshing every 30 seconds to try to get to critical email. You may remember this article highlighting good custom error pages.
After the incident was stabilized, Google posted an incident report here. According to the report, Google "underestimated the increased load that some of the new updates placed on request routing." Not sure what the "new updates" were, but it doesn't seem like Google should underestimate the anticipated load.
Noting the red "X" by Google Mail, I clicked on it at 1:48 PM to find:
It says there will be an update at 1:53 PM, so I waited until 1:58 PM and clicked again:
Hey! Wait just a second! Ten minutes ago there was not an update at 1:02 PM. What gives Google? Don't you know that 45 minutes after I announced it to everyone, people are still coming to my desk to say "Hey Josh. My email's down."? Please, just tell me what I need to know when you know it! Also, I love that there is a link to the "How to use IMAP or POP", where the first step outlined is to "Enable POP or IMAP in your Google Apps email account". I can't get to my apps account! Then I realized, I already had IMAP enabled on my account and had it set up in Outlook. So, I started up Outlook... only to be woefully reminded of why I wanted desperately to switch to Google Apps to begin with. I quit Outlook before I even used it, as it was either Outlook or every other application, and a choice had to be made. Instead, I waited for the Google update. At 2:40 PM I refreshed the Status Dashboard to find:
Hooray! We're back up! Not without a few lessons.
- Google, or any other cloud service provider, when a critical service goes down, don't show me an error that tells me to retry every 30 seconds; especially if that's not really what you want me to do. Send me to the place with the relevant information. I know, based on your incident reports, that you "published ongoing reports to the Google Apps dashboard, Gmail Help Center, the Enterprise and Gmail blogs, and the GoogleAtWork and Google Twitter feeds, to help provide customers with the latest status and available workarounds.", but the error was unhelpful. Please don't make me Google it.
- IT managers, if you're going to start using SAAS and cloud enabled services, find out, in advance, what the notification mechanism is for outages. In this case, it would have been a simple thing to have added the Apps Status Dashboard to one of my feeds.
- Don't count on Google Apps, or any other cloud service being available 100% of the time. If you have a critical meeting or a conference call that requires you to have a cloud stored document or email or presentation up and ready to go, make sure it's ready and pulled up long before your event, or make sure to store it locally, as well. Also, based on Google's comments, it may be good to enable IMAP on your account just in case you can't web-surf your email; at least then you can get to critical emails with Outlook or Thunderbird.