On July 2, 2009, at approximately 11:15PM PST, as I was lying snugly in bed, I had nary a clue that Authorize.net was going down. Apparently, there was a fire at the data center. Many of you have probably already read this story; or, perhaps you are/were a customer of Authorize.net's and were made aware by errors. I wanted to share our experience with the major outage and why the lesson is valuable for others in eCommerce. I could be writing a blog post about how much it sucks that Authorize.net even went down, given that it was only one data center scorched. I could be asking, "Where in the world was the backup data center? "Rackspace, how does a sprinkler system kill backup generators? Where is the backup for your backup's backup?!" I could also be writing about how quickly the status was updated on the situation thanks to some smart person at Authorize.net building a Twitter account that day. But this is a post about a shift in our philosophy.
So, as I slept I had no idea that our payment gateway, Authorize.net, was getting wrecked by a fire. When I woke up at about 7:30AM I had already received a text from one of the managing partners alerting me that we had received a pile of errors related to Authorize.net failures. Let me explain what that means for us. An error from our order system that there was an Authorize.net failure means that an order was placed, but no payment was collected. You might be saying to yourself, "Why in the heck would you guys place an order without first confirming that payment could be collected?!" Well, the reason is simple and goes back to our roots. Early on in our eCommerce venture, we prayed for orders every day. Our strategy on many fronts, including payment processing, was to protect orders; in this case, let the customer place the order on the customer site without failure or error (or some other barrier to transaction) and clean up the mess manually. As a consequence of this strategy, when we made a call to Authorize.net for payment processing and it failed, we would have a placed order in our system with no payment from Authorize.net. So, an email with the customer info would be forwarded over to the tech team to review and contact the customer to process a manual payment. This was a good solution for us, early on, as we didn't want to miss a single transaction. We needed the business!
This process worked for us for a good long while, since downtimes on the Authorize.net front were very rare, but when they did happen it was usually only one or two and could be resolved easily. Well, we hadn't had this issue in quite a while and it had not occurred to us that we would need to update our strategy as we grew. Then we grew. By a lot. The consequences of having more than ten times the volume of four years ago becomes readily apparent when your payment processor goes down for more than eight hours. The morning that Authorize.net went down meant that we had a few dozen orders that were placed without payment but had already begun processing for shipment at warehouses across the country. So, we had quite a cleanup effort on our hands. First, ensure that the orders don't get processed out the door, since they haven't been paid for. Second, disable the Authorize.net payment method until we can figure out what happened or is happening and identify when they're back up. Third, contact all of the customers to see if they would like to complete a manual transaction over the phone or retry their checkout using Google Checkout or PayPal (two services that we did not offer at launch). Four, document and share the happenings with the management team. Finally, fix our outdated checkout so that orders do not get placed when the authorization transaction fails.
Our checkout process and a redesign have been on our radar for an update for some time now. This, however, is one issue that escaped us until it became a pain. Hopefully, this will help us to think critically about updates to our site as we examine where we were, where we are, and where we're going.