You may or may not know that on Thursday, May 27, 2010, PayPal suffered from a significant issue. This was nothing like the $2,000 per second outage that PayPal faced in August of 2009. A logic error with PayPal's risk model led to a higher-than-normal chance of transaction decline. According to a source at PayPal, the issue affected PayPal's direct payments system (not PayPal Express Checkout) and their virtual terminal. I was told that this was an "all hands on deck" incident for PayPal.
The issue began just before 8:30AM PST and lasted until about 4PM PST. I was notified by our customer service team that there was an issue with transactions just before 9AM. They came to me to let me know that a number of transactions were declined for, seemingly, no good reason. Later in the day, I received an update that the issue affected approximately 15% of transactions for PayPal. Although, we were likely to see a higher fail rate in our customer service center, since customer's who had experienced an issue were likely to contact us, try again, and fail, again. We saw a fail rate of closer to 50 - 60% during the issue period.
In development, we had already planned to begin development shortly on a backup payment processor process. In August of 2009, we had been directly, and severely, impacted by an Authorize.net downtime. We knew that we would face payment processor issues, again, at some point. Apparently, it's inevitable. Given the number of customers who contacted us about the current issue, and the untold number of customers who did not contact us, and the thousands of dollars in lost revenues, and the poor customer experience, we knew that we would need to bump this project in priority to A1 status. So, at this point we had already had experience transacting securely with two payment processors, and had already begun work in mapping out the new process. The dev began.
By about 4PM we had wrapped up testing the new process and were ready to push it live when I received a contact from PayPal that their risk model issue had been resolved and that transactions had returned to normal. Classic. Well, we didn't beat the PayPal clock, but we did learn some things.
1) Payment processors fail. As much as I'd like to believe that they're committed to five-nines up-time, I know that will never happen.
2) Customers hate getting errors at checkout, especially after they've already entered their credit card information. I know it makes me uneasy when it happens to me; you don't have to have more than one phone call from a panicked customer to know that you've really wrecked the whole experience.
3) It's not a very difficult problem to solve. Chances are you have spent a great deal of time, energy, and resources negotiating rates and developing a system and reading through an API manual and following PCI requirements.
Building a backup system is still likely worth the time and effort, if not for the untold number of lost transactions, for the customer experience. Now I am left wondering, though, 'What happens when both processors fail at the same time?! ... I am kidding, of course; we just direct them to use Google Checkout.