Late Monday, Facebook and the social media apps it owns, WhatsApp and Instagram, suffered a major global outage, including in India, lasting several hours.
The outage potentially prevented billions of users of these apps across the globe from accessing, sending, or receiving messages, on these platforms.
In a tweet, Facebook wrote, “We’re aware that some people are having trouble accessing our apps and products. We’re working to get things back to normal as quickly as possible, and we apologize for any inconvenience”.
We’re aware that some people are having trouble accessing Facebook app. We’re working to get things back to normal as quickly as possible, and we apologize for any inconvenience.
— Facebook App (@facebookapp) October 4, 2021
WhatsApp, too, tweeted: “We’re aware that some people are experiencing issues with WhatsApp at the moment. We’re working to get things back to normal and will send an update here as soon as possible. Thanks for your patience!”
We’re aware that some people are experiencing issues with WhatsApp at the moment. We’re working to get things back to normal and will send an update here as soon as possible.
Thanks for your patience!
— WhatsApp (@WhatsApp) October 4, 2021
What might have caused the outage?
Facebook published a brief blog post-Monday evening attributing the massive global outage that took its services and internal communications tools offline for several hours to a “faulty configuration change” to its routers.
“Configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt,” Facebook vice president of infrastructure Santosh Janardhan wrote in the post.
“The underlying cause of this outage also impacted many of the internal tools and systems we use in our day-to-day operations, complicating our attempts to quickly diagnose and resolve the problem,” Janardhan added.
As well as, expanding on Facebook's statement on the root cause behind the outage, tech experts explain a Domain Name System (DNS) mishap.
“Their [Facebook and its affiliates] DNS names stopped resolving, and their infrastructure IPs were unreachable. It was as if someone had “pulled the cables” from their data centers all at once and disconnected them from the Internet,” two senior Engineers at Cloudflare, a web infrastructure and Web security company explained in a blog post.
According to the duo, Facebook stopped announcing routes to its DNS server through Border Gateway Protocol (BGP), a mechanism to exchange routing information between autonomous systems (AS) on the Internet.
Simply put, the internet that is connected through several networks, exchanges route information constantly to deliver every network packet to their final destinations. These networks are bound together by BGP that allows Facebook and other websites to advertise their presence to other networks that form the Internet, which went offline causing the outage.
Facebook DNS servers unavailable
When anyone tries to reach Facebook in the browser through its URL https://facebook.com, the DNS resolver that translates this domain into the IP address checks the cache and uses it. In case, there is no such cache available, it answers by the domain nameservers, typically hosted by the entity that owns it. If such nameservers are also unavailable or fail to respond, the browser displays an error to the user.
“At 1658 UTC we noticed that Facebook had stopped announcing the routes to their DNS prefixes. That meant that, at least, Facebook’s DNS servers were unavailable. Because of this Cloudflare’s 188.8.131.52 DNS resolver could no longer respond to queries asking for the IP address of facebook.com or instagram.com,” the blog reiterated along with a screenshot of the failed Facebook DNS servers.
Meanwhile, other Facebook IP addresses remained routed but weren’t particularly useful since without DNS Facebook and related services were effectively unavailable, it added.