A certificate snafu grounded Microsoft's Azure cloud last weekend for more than a day. While the outage led to complaints by some users, one vendor saw the mishap as an opportunity.
The cause of the outage was the failure to renew an SSL certificate, according to Microsoft. The outage began at around 3:44 pm ET last Friday and service wasn't fully restored until 8 pm ET the next day.
To help salve dissatisfaction caused by the outage, Microsoft said it would proactively provide credits to affected customers.
The cloud disruption gave Matt Watson a lot of time to think. After all, his company, Stackify, of Kansas City, Missouri was knocked off Azure for 12 hours. The brainchild of those ruminations was a free service to help certificate administrators avoid vendors' mistakes.
The service hosted at CertAlert.me gives administrators who register at the site an instant report on their domains, and they'll receive reminders from the site about the status of their SSL certificates.
After Azure went down, Watson and company began looking for a service that reminded administrators about certificate expirations and couldn't find any. So the company, which specialises in creating remote application monitoring and troubleshooting tools, set up a cert alert service up itself.
Microsoft's certificate faux pas isn't as inept as it might sound, according to Watson. "It's crazy and you'd think someone as big as Microsoft would be able to keep track of it, but it happens quite a lot," he said in an interview.
"It's happened to me, and developers that work for me will tell you it's happened to them, too," he said.
After a certificate is purchased, it's easy for its maintenance to fall between the cracks, he said. "Some vendors send out alerts, but they send those alerts to the person who bought the certificate. If that person leaves the IT department, the alert may not reach who it needs to reach to renew the certificate."
Outages like Azure's can have unwelcome consequences for companies running cloud services, according to Jonathan Braunhut, chief scientist for Kemp Technologies in Yaphank, New York, a maker of network server load balancers and application delivery controllers. Expired SSL certificates can result in customer desertion and vulnerability to cyberattacks, Braunhut said.
He maintained that certificate maintenance is often done manually on a spreadsheet, where it's subject to human error. Complicating matters, certificate renewals can be one, two, even three years apart.
"Generally, people administer rare events less well than they do more frequent events," he said.
Although Microsoft has yet to release details of what went wrong, Braunhut said it appears that the expired certificate was installed in lots of places throughout the Azure infrastructure.
"I suspect the same cert that expired was all over the place," he said. "That's another administrative challenge that they had."
More information on the outage should be available soon, a Microsoft spokesperson said via email. The company is creating a root cause analysis for the disruption and details will be posted to the company's Azure blog when it is completed.
Join the CIO New Zealand group on LinkedIn. The group is open to CIOs, IT Directors, COOs, CTOs and senior IT managers.