Posts: 15,569
Threads: 10,027
Thanks Received: 9,253 in 7,404 posts
Thanks Given: 10,105
Joined: 12 September 18
19 November 25, 10:49
Quote:Cloudflare suffered its worst outage since 2019. The incident was caused by a bug in the company's Bot Management system.
Matthew Prince, Cloudflare co-founder and CEO, has published a detailed article on the company's blog to explain what went wrong.
The outage, which occurred on 18 November 2025 at 11:20 UTC, had a global impact as several websites went offline. Price clarified that Cloudflare was not hacked by a DDoS attack, as had been initially suspected. That's important to clarify, because people may have been worried whether it was taken down by malware. Once the issue had been identified, Cloudflare replaced a backup of the feature file to sort out the issue. The company says that its Core traffic was normal by 14:30, about three hours after the issue began. Cloudflare managed to restore all systems by 17:06.
Cloudflare described the issue as follows "A change in our underlying ClickHouse query behavior (explained below) that generates this file caused it to have a large number of duplicate “feature” rows. This changed the size of the previously fixed-size feature configuration file, causing the bots module to trigger an error."
Here's a simpler explanation. Cloudflare had made a change to one of its database systems' permissions, and this had caused the database to output multiple entries into a “feature file” used the company's Bot Management system. The feature file in question is used to keep the Bot Management system up to date to handle threats. This system has many modules, including a machine learning model that is used to generate bot scores. Every network request has a bot score, and Cloudflare's customers, i.e. websites, services, use the bot scores to determine which bots can access their site, or block them accordingly.
Continue Reading...