When the Internet Blinked: Inside the November 18 Cloudflare Outage and What Really Happened

On November 18, 2025, the Internet hit a strange and unexpected speed bump. Websites worldwide — from small businesses to major enterprises — began showing error pages. Apps struggled to connect. Authentication systems failed. Even Cloudflare’s own status page briefly went offline.

At first glance, it looked like the start of a massive cyberattack.

But the truth was far more surprising.

In this detailed breakdown, we at IHA Cloud walk through exactly what happened, why it happened, and what the incident tells us about the hidden complexity of the Internet’s infrastructure.

A Normal Day — Until 11:20 UTC

Cloudflare operates one of the most widely distributed networks in the world. At 11:20 UTC, that network suddenly began returning 5xx errors — essentially an internal server failure — for millions of requests.

Visitors saw Cloudflare-branded error messages when trying to access sites.

Behind the scenes, engineers saw erratic behavior: traffic would fail, then suddenly recover, then fail again. This “flickering” made the situation look eerily like a high-volume, targeted DDoS attack. But the real cause was much quieter — and buried deep inside Cloudflare’s internal systems.

Not an Attack — A Database Permission Change Gone Wrong

The outage began with an update to Cloudflare’s ClickHouse database cluster. The update was meant to improve permission management and make queries more secure.

But one unexpected side-effect changed everything.

How it spiraled:

A database permission change caused a query to return duplicate rows.
That query was responsible for generating a “feature file” — a configuration file used by Cloudflare’s Bot Management system, which helps distinguish humans from bots.
The feature file doubled in size unexpectedly.
Cloudflare’s routing software downloaded this file globally.
The software had a hidden size limit.
When the file exceeded that limit, the routing process crashed.
Crashed routing = failed traffic = global 5xx errors.

To make things worse, the feature file was regenerated every 5 minutes. Sometimes it was generated correctly. Sometimes it wasn’t. That’s why Cloudflare experienced a cycle of:

✔ normal service
✖ failure
✔ normal service
✖ failureThis back-and-forth made diagnosis extremely difficult.

Why Engineers First Suspected a DDoS Attack

During this chaos, another unrelated glitch occurred:
Cloudflare’s status page — which is hosted outside Cloudflare — went down due to an entirely separate issue.

To engineers dealing with fluctuating errors, massive 5xx spikes, and a dead status page… it looked exactly like a coordinated large-scale attack.

Even internal chats reflected this suspicion.

Only later did the team trace the root cause: the oversized Bot Management feature file.

How Cloudflare Stabilized the Internet Again

It took several steps to untangle the issue:

1. Stopping the spread of the faulty configuration file

Cloudflare paused generation of the feature file to prevent new bad versions from propagating.

2. Rolling back to a last-known-good version

A clean feature file was manually injected into the distribution system.

3. Restarting core proxy services globally

Once devices had the correct file, the routing layer (FL and FL2) began recovering.

4. Fixing downstream services such as:

Workers KV
Access
Turnstile
Dashboard authentication

5. Full recovery:

By 14:30 UTC, most traffic was back to normal.
By 17:06 UTC, all Cloudflare systems were fully recovered.

Who Was Impacted?

Because Cloudflare sits in front of a huge portion of the Internet, the outage affected:

Websites (HTTP 5xx errors)
API requests
Login systems relying on Cloudflare Access
Worker KV-based internal and external services
CAPTCHA/Turnstile authentication flows
CDN performance due to increased CPU load during error handling

Some users even saw false-positive bot detections because bot scoring failed.

Why the Issue Became So Big

This outage wasn’t caused by a single bug — it was a combination of:

✔ A database permissions change

Exposing additional metadata by accident.

✔ A configuration file generator depending on that metadata

Which doubled its output size.

✔ A strict size limit in the bot module

Causing a panic when exceeded.

✔ Global propagation of changes

Which meant the incorrect file hit machines everywhere, almost instantly.

✔ A coincidence: the Cloudflare status page failing simultaneously

Creating confusion during early investigation. This rare “perfect storm” turned a simple metadata change into a multi-hour global outage.

How Cloudflare Plans to Prevent This in the Future

Cloudflare has publicly committed to several improvements:

1. Harden validation of internal config files

Even internally generated files will be treated like user input — validated before they roll out.

2. Global kill switches

Allowing teams to instantly disable problematic features.

3. Improved error handling

Eliminating unbounded memory allocations and avoiding system panics when limits are exceeded.

4. Better safeguards between systems

So database metadata changes can’t silently propagate into runtime systems without checks.

5. Updated failure mode reviews Across all core proxy modules (FL & FL2).

Why This Outage Matters (Even If Your Site Wasn’t Down)

Incidents like this remind us how interconnected the Internet is.

A change inside a globally distributed cloud platform — even a small one — can ripple into major outages. But they also highlight how much engineering effort goes into ensuring reliability every day.

At IHA Cloud, we study incidents like this not to point fingers, but to improve our own infrastructure practices and resilience models.

Understanding these failures helps the entire industry evolve.

Final Thoughts

Cloudflare hasn’t had an outage of this scale since 2019. This one was painful, unexpected, and complex — and Cloudflare has openly acknowledged it.

The silver lining?

The Internet recovered quickly, lessons were learned, and the global cloud ecosystem becomes stronger each time we analyze such events.

If you want a simpler summary:

👉 A database change caused a config file to double in size →
A size limit caused the routing software to crash →
The crash caused 5xx errors worldwide →
Cloudflare fixed it by rolling back the file and stabilizing the network.We hope this breakdown gives you a clear and understandable look at what happened on November 18, 2025 — one of the Internet’s most unusual days in recent memory.

When the Internet Blinked: Inside the November 18 Cloudflare Outage and What Really Happened

A Normal Day — Until 11:20 UTC

Not an Attack — A Database Permission Change Gone Wrong

Why Engineers First Suspected a DDoS Attack

How Cloudflare Stabilized the Internet Again

Why the Issue Became So Big

How Cloudflare Plans to Prevent This in the Future

Why This Outage Matters (Even If Your Site Wasn’t Down)

Final Thoughts

If you want a simpler summary:

Leave a Comment Cancel Reply

Services

Application & Web Hosting

Database Management & Migration

DevOps as a Service

Cost Audit & Optimization

CI/CD as a Service

Security Audit

Quick Links

Home

Services

Blogs

Review

Contact Us

Get in Touch

AWS

AWS Expert Team

SOC 2

ISO 27001

Privacy Policy