“`html

Cloudflare issued a detailed report explaining the origins of a significant network disruption that hindered global internet traffic for multiple hours, impacting millions of users and various platforms.

The disruption, which commenced at 11:20 UTC, originated from an internal configuration mistake rather than any cyber threat, highlighting the vulnerabilities present in even the strongest cloud infrastructures.

This occurrence mirrors recent interruptions at rivals such as Microsoft Azure and Amazon Web Services, raising concerns about the fragility of global digital dependencies.

The difficulties faced by Cloudflare were a result of a routine update to permissions in its ClickHouse database cluster, aimed at improving security for distributed queries.

At 11:05 UTC, the modification rendered underlying table metadata in the ‘r0’ database visible to users; however, a Bot Management query neglected to accommodate this, leading to the retrieval of duplicate column data and inflating a crucial feature file to double its anticipated size.

This file, updated every five minutes to counter evolving bot threats through machine learning, exceeded the software’s hardcoded limit of 200 features, instigating panics in the core proxy system known as FL.


google

Initially misidentified as a substantial DDoS attack coinciding with the downtime of Cloudflare’s external status page, the variable failures baffle investigators, as good and bad files alternated throughout the cluster’s gradual rollout.

The Bot Management module, vital for evaluating automated traffic, ceased request processing, triggering cascading errors across the network. In the newer FL2 proxy, this led to outright 5xx HTTP errors; older FL versions defaulted bot scores to zero, potentially obstructing legitimate traffic for clients utilizing bot-blocking regulations.

The blackout severely impacted core services, presenting error pages to users accessing Cloudflare-secured sites, while latency surged due to resource-intensive debugging efforts.

Turnstile CAPTCHA completely failed, preventing logins; Workers KV experienced heightened errors, indirectly hampering dashboard access and authentication through Cloudflare Access.

Email Security temporarily lost some spam detection capabilities, although no significant customer data was compromised, and configuration updates were delayed. By 17:06 UTC, full resolution was achieved after ceasing bad-file propagation, reverting to a known-good version, and restarting the proxies.

Cloudflare’s CEO, Matthew Prince, extended heartfelt apologies, characterizing the event as “deeply painful” and unacceptable for a leading internet service provider. The company recognized this as its most significant core traffic outage since 2019.

Major Cloud Giants Outage

This event underscores a troubling pattern of failures associated with configuration errors among prominent cloud providers.

Only weeks earlier, on October 29, 2025, Azure encountered a global outage due to a problematic tenant change in its Front Door CDN, disrupting Microsoft 365, Teams, and Xbox for several hours and affecting airlines like Alaska.

Likewise, AWS suffered a 15-hour blackout on October 20 in its US-East-1 region, where DNS complications in DynamoDB affected EC2, S3, and services like Snapchat and Roblox.

A minor AWS e-commerce glitch impacted Amazon.com on November 5, halting checkouts amid holiday preparations. Experts caution that these incidents stress an over-reliance on centralized providers, where single errors can “break the internet” repeatedly in 2025.

To mitigate future occurrences, Cloudflare is fortifying its file ingestion processes to protect against malformed inputs. They are also establishing global kill switches, minimizing the overload of error reports, and scrutinizing proxy failure modes.

Even though the outage was not a result of malicious intent, it serves as a stark reminder that as cloud ecosystems grow, the demand for operational precision intensifies.

“`