On March 13th of 2019, Facebook suffered a partial outage, which severely affected users at the global level. The service interruption lasted around 24-hour and all the Facebook products and applications got inaccessible. It is worth noting that Facebook owns WhatsApp and Instagram- the next two most important communication media, and both went down. Moreover, users who used Facebook credentials to login to other platforms and applications had difficulties in accessing those services as well.
The outage impact was so intense that the world’s largest social media platform with 2.3 Billion active users at that time, needed to turn to rival Twitter to make all aware of the difficulties. The issue got significant attention with vexed users sharing their frustration at other platforms. Many small businesses and startups rely on Facebook, WhatsApp, and Instagram for their business communications, which all went down. It was not only the users; Facebook business witnessed a considerable hit as well. A Yahoo Finance Report predicted that Facebook would bring $69 Billion revenue in 2019. The average of $189 million per day got at stake. Facebook stock price went down about 1.5% the next day.
Reason for the service outage was cited to be a “server configuration change”, making the network team to reconfigure it completely. Facebook had also witnessed downtime in 2018 due to a bug in the server, knocking it offline for around 24 hours, but the 2019 outage was the worst in scope and length, recorded in internet history!
Intermittent technical issues at server or network level leading to downtime are common for app services. In June 2019, even Google experienced a massive network disruption, resulting in elevated error rates across Google services like Google Cloud Platform, Gmail, YouTube, G Suite, Google Drive and others. Again, the identified reason behind the outage was mismanagement in server configuration changes – where configuration changes across servers in a particular region got wrongly replicated to servers across neighboring regions.
Automated Configuration Drift Management is the Solution
We always strive to maintain a desired, consistent state of servers across Network Infrastructure when changes are made constantly, ensuring uninterrupted peak performance of the systems. The business requests are in a constant state of flux and every week, network leaders are required to respond to hundreds of change requests involving configuring network devices and servers. Most of the configuration changes are repetitive and involve multiple stakeholders. Even minor errors in configuration changes can lead to inconsistencies, poor performance, ultimately causing a network outage risking business operations and security.
In case of errors or misconfigurations, there should be a provision to compare the changes. During outages, network leaders with multiple stakeholders involved have to go over hundreds of lines of configurations and troubleshoot the issue. In all the scenarios, they wish to find an easy way to move the configuration back to the last working one.
However, manual and incumbent solution approaches of configuring the devices have limitations and no provision to ensure compliance with government regulations, industry best practices, and standards. The key to overcome these challenges is to invest in an advanced automation solution for configuration drift management.
Automate and Orchestrate F5 BIG-IP Platform with AppViewX
Prevent Network Outages with AppViewX ADC+
The AppViewX platform is aware of the consequences that the network errors and outages put on the organizations. For this reason, this platform is relied on by several large Fortune enterprises. AppViewX ADC+ is designed to deliver services fast while ensuring compliances and enables NetOps teams to put both proactive and reactive configuration management in place, easily.
Also, it facilitates:
- Elimination of unauthorized changes and errors due to manual configurations
- Flexibility to undo changes on network devices
- Configuration comparisons before pushing to end-devices
- Preparation for audits
- Complete adherence with best practices and benchmarks
How ADC+ helps avoid configuration drift:
1. Automated remediation while ensuring compliances
Partial solutions to server compliance management and support to the infrastructure components are facilitated by some tools but they fail to offer pre-built templates, remediation capabilities, or changes to the network infrastructure. This makes compliance management a very challenging part of network operations. Configuration complexity, lesser visibility, and changing regulatory standards are the major problematic areas. AppViewX ADC+ enforces compliance and provides pre-built templates for change management.
2. Automation Workflows
The AppViewX platform offers workflow-centric solutions to automate, track and report changes while ensuring compliances with policies. The intuitive interface helps create complex workflows with validation checks at all steps. Moreover, the post-validation acts as a sanity test to ensure the flawless functionality of applications and ensure configuration deployment as per organizational standards. With this platform, users can define a rollback workflow, which gets auto-triggered in case workflow fails during execution.
3. ‘Diff Checker’ for Configuration Compliance
The ‘Diff Checker’ feature helps in checking the compliances of configurations and ensures perfect adherence to standards. Also one can validate the configuration changes between:
- Pre-Validation & Post-Validation Stages
- Implementation & Rollback Stages
- Multiple Peer Reviews in an automation process
- New configurations and the standard golden configurations
4. App-Centric Visibility & Faster Troubleshooting
The topological view of the applications offers a detailed hierarchical network map of the delivery infrastructure. This allows greater app-centric visibility into application delivery infrastructure, which helps remediate issues faster. During any outage, automation workflows can be easily launched to get back to a stable state within minutes.