• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

My TechDecisions

  • Best of Tech Decisions
  • Topics
    • Video
    • Audio
    • Mobility
    • Unified Communications
    • IT Infrastructure
    • Network Security
    • Physical Security
    • Facility
    • Compliance
  • RFP Resources
  • Resources
  • Podcasts
  • Subscribe
  • Project of the Week
  • About Us
    SEARCH
IT Infrastructure, Network Security, Physical Security

What Happened With Facebook’s Outage?

A faulty configuration change led to outages across Facebook, Instagram and WhatsApp for at least six hours on Monday, Oct. 4.

October 6, 2021 Alyssa Borelli Leave a Comment

Facebook outage
Chinnapong/stock.adobe.com

A faulty configuration change led Facebook, Instagram and WhatsApp to all go down for at least six hours on Monday, October 4. Those trying to reach the social media platforms were met with browsers and apps displaying DNS errors on connection attempts.

The routing prefixes suddenly disappeared from the internet’s border gateway protocol (BGP), a routing protocol that makes the internet work and makes it possible for devices from around the world to communicate with each other.

Since Facebook’s domain and DNS record are hosted on the company’s own routing prefix, when the BGP prefixes were removed, no one could connect to the IP addresses or services running on top of them.

During one of the routine maintenance jobs, a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections, effectively disconnecting Facebook data centers globally prompting a wide outage.

Read: You Need To Look Out For These Software Vulns

According to a Facebook blog post, its systems were designed to take audit commands like this to prevent mistakes, but a bug in the audit tool prevented it from stopping the command.

The total loss of connection made things worse for Facebook — engineers working on trying to figure what went wrong couldn’t access the data center through normal means because the networks were down and the total loss of DNS broke many of Facebook’s internal tools to investigate and resolve outages like this.

Engineers were sent onsite to the data centers to debug the issue and restart the systems, however it took time because data centers are designed with high levels of physical and system security in mind.

Once the backbone network connectivity was restored, Facebook feared a surge in traffic, which could have caused a dip in power consumption and could have put the electrical system to caches at risk. They had to slowly flip services back on.

While Facebook continually stress tests its systems, the company never tested its global backbone being taken offline. “In the end, our services came back up relatively quickly without any further systemwide failures. And while we’ve never previously run a storm that simulated our global backbone being taken offline, we’ll certainly be looking for ways to simulate events like this moving forward,” said Santosh Janardhan, VP of Infrastructure at Facebook in a blog post.

“We’ve done extensive work hardening our systems to prevent unauthorized access, and it was interesting to see how that hardening slowed us down as we tried to recover from an outage caused not by malicious activity, but an error of our own making,” he says.

“I believe a tradeoff like this is worth it — greatly increased day-to-day security vs. a slower recovery froma hopefully rare event like this. From here on out, our job is to strengthen our testing, drills, and overall resilience to make sure events like this happen as rarely as possible,” said Janardhan.

If you enjoyed this article and want to receive more valuable industry content like this, click here to sign up for our digital newsletters!

Tagged With: border gateway protocol, Data Center, DNS, Facebook, infrastructure, storm drills, stress testing

Related Content:

  • data breach Nearly 900 Schools Impacted by National Student Clearinghouse…
  • Rearview shot of two young designers giving each other a fist bump in an office, on display is Crestron desk scheduling device Crestron Introduces Desk Q and Desk Touch Scheduling…
  • cyber-attack-skull Spike in Cyberattacks Exposes Vulnerabilities in University Security…
  • OptixAV RMM Solution Solutionz OPTIXAV RMM Platform for AV Environments Now…

Free downloadable guide you may like:

  • Download TechDecisions' Blueprint Series report on Security Awareness now!Blueprint Series: Why Your Security Awareness Program is Probably Falling Short

    Learn about the evolution of phishing attacks and best practices for security awareness programs to ensure your organization is properly prepared to defend against them in this report from TechDecisions' Blueprint Series.

Reader Interactions

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Get the FREE Tech Decisions eNewsletter

Sign up Today!

Latest Downloads

Download TechDecisions' Blueprint Series report on Security Awareness now!
Blueprint Series: Why Your Security Awareness Program is Probably Falling Short

Learn about the evolution of phishing attacks and best practices for security awareness programs to ensure your organization is properly prepared t...

Workplace Collaboration Tools for Corporate Spaces
Workplace Collaboration Tools for Corporate Spaces

From lobbies and shared spaces to conference rooms and multipurpose facilities, you need high-performing AV technology to effectively share informa...

ChatGPT, generative AI, enterprise, workplace
Blueprint Series: ChatGPT and Generative AI in the Workplace

This latest release of the TechDecisions Blueprint Series explores the new phenomenon of tools such as ChatGPT and how IT leaders should go about d...

View All Downloads

Would you like your latest project featured on TechDecisions as Project of the Week?

Apply Today!
Sharp Microsoft Collaboration HQ Logo

Learn More About the
Windows Collaboration Display

More from Our Sister Publications

Get the latest news about AV integrators and Security installers from our sister publications:

Commercial IntegratorSecurity Sales

AV-iQ

Footer

TechDecisions

  • Home
  • Welcome to TechDecisions
  • Subscribe to the Newsletter
  • Contact Us
  • Advertise with Us
  • Comment Guidelines
  • RSS Feeds
  • Twitter
  • Facebook
  • Linkedin

Free Technology Guides

FREE Downloadable resources from TechDecisions provide timely insight into the issues that IT, A/V, and Security end-users, managers, and decision makers are facing in commercial, corporate, education, institutional, and other vertical markets

View all Guides
TD Project of the Week

Get your latest project featured on TechDecisions Project of the Week. Submit your work once and it will be eligible for all upcoming weeks.

Enter Today!
Emerald Logo
ABOUTCAREERSAUTHORIZED SERVICE PROVIDERSDO NOT SELL MY PERSONAL INFORMATIONTERMS OF USEPRIVACY POLICY

© 2023 Emerald X, LLC. All rights reserved.