• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

My TechDecisions

  • COVID-19 Update
  • Best of Tech Decisions
  • Topics
    • Video
    • Audio
    • Mobility
    • Unified Communications
    • IT Infrastructure
    • Network Security
    • Physical Security
    • Facility
    • Compliance
  • RFP Resources
  • Resources
  • Podcasts
  • Subscribe
  • Project of the Week
  • About Us
    SEARCH
IT Infrastructure, Network Security, Physical Security

What Happened With Facebook’s Outage?

A faulty configuration change led to outages across Facebook, Instagram and WhatsApp for at least six hours on Monday, Oct. 4.

October 6, 2021 Alyssa Borelli Leave a Comment

Facebook outage
Chinnapong/stock.adobe.com

A faulty configuration change led Facebook, Instagram and WhatsApp to all go down for at least six hours on Monday, October 4. Those trying to reach the social media platforms were met with browsers and apps displaying DNS errors on connection attempts.

The routing prefixes suddenly disappeared from the internet’s border gateway protocol (BGP), a routing protocol that makes the internet work and makes it possible for devices from around the world to communicate with each other.

Since Facebook’s domain and DNS record are hosted on the company’s own routing prefix, when the BGP prefixes were removed, no one could connect to the IP addresses or services running on top of them.

During one of the routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections, effectively disconnecting Facebook data centers globally prompting a wide outage.

Read: You Need To Look Out For These Software Vulns

According to a Facebook blog post, its systems were designed to take audit commands like this to prevent mistakes, but a bug in the audit tool prevented it from stopping the command.

The total loss of connection made things worse for Facebook — engineers working on trying to figure what went wrong couldn’t access the data center through normal means because the networks were down and the total loss of DNS broke many of Facebook’s internal tools to investigate and resolve outages like this.

Engineers were sent onsite to the data centers to debug the issue and restart the systems, however it took time because data centers are designed with high levels of physical and system security in mind.

Once the backbone network connectivity was restored, Facebook feared a surge in traffic, which could have caused a dip in power consumption and could have put the electrical system to caches at risk. They had to slowly flip services back on.

While Facebook continually stress tests its systems, the company never tested its global backbone being taken offline. “In the end, our services came back up relatively quickly without any further systemwide failures. And while we’ve never previously run a storm that simulated our global backbone being taken offline, we’ll certainly be looking for ways to simulate events like this moving forward,” said Santosh Janardhan, VP of Infrastructure at Facebook in a blog post.

“We’ve done extensive work hardening our systems to prevent unauthorized access, and it was interesting to see how that hardening slowed us down as we tried to recover from an outage caused not by malicious activity, but an error of our own making,” he says.

“I believe a tradeoff like this is worth it — greatly increased day-to-day security vs. a slower recovery froma hopefully rare event like this. From here on out, our job is to strengthen our testing, drills, and overall resilience to make sure events like this happen as rarely as possible,” said Janardhan.

Tagged With: border gateway protocol, Data Center, DNS, Facebook, infrastructure, storm drills, stress testing

Related Content:

  • This Week in IT, IT News, Microsoft, Google, Dell This Week in IT: Windows 11 Update, Tech…
  • Cloud THreats, Proofpoint Third Parties and Partners are Leading to Increased…
  • Google Curated Detections Chronicle Google Releases Curated Detections in Chronicle
  • Fortinet, ransomware, zero day vulnerabilities, log4shell Ransomware, Zero-Day Vulnerabilities On the Rise

Free downloadable guide you may like:

  • Shadow ITBlueprint Series: How to Reduce Shadow IT

    The distributed work model gives employees the flexibility they demand, but it can lead to shadow IT and introduce unnecessary security risk. Research finds that this distributed work environment is leading to IT management blind spots and shadow IT.

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Get the FREE Tech Decisions eNewsletter

Sign up Today!

Latest Downloads

Shadow IT
Blueprint Series: How to Reduce Shadow IT

The distributed work model gives employees the flexibility they demand, but it can lead to shadow IT and introduce unnecessary security risk. Resea...

Hybrid Work webinar
Featured Webcast: Collaboration 2.0 — Where Are We Now?

In this webinar, subject matter experts discuss the transformation of the workplace, the rise of hybrid workers, the importance of open connectivit...

guide to end user training cover
Pro Tips for Conducting End User Training

Effective trainings are the glue that can make the difference following a new technology implementation that your team has spent so much time, effo...

View All Downloads

Would you like your latest project featured on TechDecisions as Project of the Week?

Apply Today!
Sharp Microsoft Collaboration HQ Logo

Learn More About the
Windows Collaboration Display

More from Our Sister Publications

Get the latest news about AV integrators and Security installers from our sister publications:

Commercial IntegratorSecurity Sales

AV-iQ

Footer

TechDecisions

  • Home
  • Welcome to TechDecisions
  • Subscribe to the Newsletter
  • Contact Us
  • Media Solutions & Advertising
  • Comment Guidelines
  • RSS Feeds
  • Twitter
  • Facebook
  • Linkedin

Free Technology Guides

FREE Downloadable resources from TechDecisions provide timely insight into the issues that IT, A/V, and Security end-users, managers, and decision makers are facing in commercial, corporate, education, institutional, and other vertical markets

View all Guides
TD Project of the Week

Get your latest project featured on TechDecisions Project of the Week. Submit your work once and it will be eligible for all upcoming weeks.

Enter Today!
Emerald Logo
ABOUTCAREERSAUTHORIZED SERVICE PROVIDERSTERMS OF USEPRIVACY POLICY

© 2022 Emerald X, LLC. All rights reserved.