On October 25th, 2022, the popular instant messaging platform WhatsApp suffered a major global outage affecting over 2 billion users who rely on WhatsApp for communication and payments. The outage lasted for about two hours, rendering WhatsApp inaccessible for users in several parts of the world, including prominent markets such as India and the UK.
As WhatsApp services resumed after a two-hour-long downtime, a Meta company spokesperson said in an email statement, “We know people had trouble sending messages on WhatsApp today. We have fixed the issue and apologize for any inconvenience.” Although the company remained tight-lipped about the actual cause of the outage, it claimed that the outage was a result of a technical error.
WhatsApp is a critical means of communication for people, businesses, and governments worldwide. According to the DownDetector, an outage reporting site, tens of thousands of vexed consumers reported the outage online. Small businesses were especially hit hard as many depend on WhatsApp for communication and payments. When WhatsApp suffered a blackout for nearly six hours in October 2021, the impact was far-reaching, affecting a large spectrum of asset trading from cryptocurrencies to oil. To that end, it is essential that companies understand the cascading effects of outages and work towards implementing the right measures to prevent them in the first place.
Sometimes, All It Takes Is One Expired Certificate!
While the root cause of the latest WhatsApp outage was not disclosed, a common culprit of outages can often be traced back to an expired SSL/TLS certificate. Organizations typically use SSL/TLS certificates for authentication, encryption, and data integrity. They play a critical role in enabling secure and trusted communication between applications, devices, and servers. Understandably, these certificates are issued with a validity period to reduce the risk of certificate misuse by threat actors, and so must be constantly monitored for validity and renewed in time. If not, they expire and force the applications, workloads, or devices offline.
Frequent Renewals are Not the Problem. Manual Processes Are.
Many organizations still use spreadsheets or home-grown tools to monitor digital certificates and manual processes to renew and provision them. While this may work for infrastructures with a handful of certificates, as the number of certificates grows, manually tracking hundreds and thousands of certificates for expiry becomes an unimaginably tedious task. As a result, PKI and security teams often miss out on expiry dates, resulting in application and service outages causing downtime, lost revenue and reputation damage.
Some of the global certificate-related outages that made headlines in the recent past include Google Voice, Microsoft Exchange, and the latest Spotify-owned Megaphone, which experienced an outage for 8 hours!
If there is an important lesson to be learned from the above outages, it is that no organization, however powerful it may be, is immune to them. When it comes to preventing expiry-related outages, manual processes and ad hoc tools are grossly inadequate. The expiring certificates need to be renewed on time, and with manual certificate management, renewals can be easily overlooked or forgotten.
Certificate Management Best Practices to Prevent Expiry-Related Outages
Steering clear of certificate expiry-related outages isn’t necessarily difficult if you properly efficiently manage certificates. Here are some certificate management best practices you can follow to save your organization from the severe financial and reputational damages of an outage:
1. Build Certificate Visibility
Complete visibility of all certificates in your infrastructure is essential to stay on top of expiring certificates and prevent outages. Discovering and building a central inventory of certificates with all the necessary information, such as expiry timelines, crypto standards, certificate location, and issuing certificate authority, greatly helps in proactively monitoring certificates for expiry and initiating timely renewals. Even in the event of an outage, access to certificate information helps quickly identify the expired certificate and its location, and renew it to restore services.
2. Use Alerting and Reporting
Implementing automated alerting and reporting systems for events like certificate expiry is crucial for outage prevention. Auto-alerts can be delivered to certificate owners via emails for manual actions or via simple network management protocols (SNMP) for automation. Sending certificate expiration alerts well before the date of expiry helps certificate owners update or renew the certificate in time and avoid any last-minute rush.
3. Establish Certificate Ownership
Ambiguous certificate ownership is also one of the common reasons for a certificate-related outage. Ownership dilemmas such as – Who should have tracked and monitored the certificate? Was the approval process followed? Who provisioned it? Who was responsible for enforcing a security policy around certificate issuance? often create confusion leading to outages.
Establishing certificate ownership, or in other words, delegating the management responsibility of certificates and keys, is one of the first steps toward preventing outages. Define an ownership hierarchy, an approval workflow, and a certificate enrollment process to provide the right information to the right people at the right time and enforce accountability for certificate actions. Doing so helps eliminate the existence of rogue, undocumented certificates that expire and cause outages.
4. Automate Certificate Management
Manual certificate management processes are, needless to say, time-consuming and error-prone. These processes often delay renewals and are subject to certificate provisioning errors resulting in outages. To eliminate these delays and human errors, it is essential that you automate certificate management. An automated certificate lifecycle management (CLM) solution allows you to automatically renew certificates based on pre-set policies and deploy them on the right device or application without any human intervention. This helps ensure that all certificates are renewed on time and properly installed. An advanced automation solution can also enable you to automatically renew certificates with newer and safer crypto standards for stronger security.
Becoming Outage-Free is a Necessity – the Sooner, the Better.
Outages continue to remain a serious challenge for organizations. Besides the economic implications of business downtime and lost productivity, the organization’s reputation also takes a blow due to disgruntled consumers.
In the aftermath of the Whatsapp outage, it is said that the shares in Meta, the parent company, dropped 0.7% to $128.85 in pre-market trading. It has also been observed in the past that when Whatsapp went down, it prompted users to seek alternative services, leading to a surge in downloads of Telegram and Snap. This churn highlights the scope and impact outages can have on businesses.
Prevent Outages with AppViewX CERT+ Certificate Lifecycle Automation
Preventing certificate-related outages is possible. Following certificate management best practices and investing in a mature certificate lifecycle automation solution is a great place to start towards outage prevention.
AppViewX CERT+ is a full-fledged certificate lifecycle management solution that provides holistic visibility of your certificate infrastructure and automates certificate lifecycle processes end-to-end to help eliminate outages and security vulnerabilities.
With AppViewX CERT+, you can:
- Automatically discover all certificates in your infrastructure and take control of your certificate inventory with centralized management
- Gain complete visibility of the certificate ecosystem to accurately track certificates for expiry and validity
- Enforce key and certificate security policies to establish certificate ownership and ensure timely renewals
- Automate certificate renewal and provisioning for fast and error-free renewals