Service Outages Postmortems

Comments discuss recent major outages at providers like Fastly, Cloudflare, Backblaze, and others, analyzing root causes such as configuration errors, network failures, overloads, and cascading effects, along with praises for detailed postmortem blog posts.

➡️ Stable 0.6x DevOps & Infrastructure

4,599

Comments

Years Active

Top Authors

#9092

Topic ID

Activity Over Time

2007

2008

2009

2010

2011

2012

175

2013

150

2014

133

2015

139

2016

200

2017

199

2018

231

2019

350

2020

365

2021

605

2022

418

2023

479

2024

357

2025

499

2026

Top Contributors

jacquesm (13) tptacek (13) tyingq (12) colesantiago (12) hinkley (12)

Keywords

IT CPU MUST PDT QED thestreet.com youtube.com backblaze.com MB AI outage outages caused incident failure traffic network service triggered failed

Sample Comments

mehphp • Apr 20, 2021 • View on HN

Wasn't their previous major outage because of a bad migration?

tpike01 • Oct 7, 2021 • View on HN

Was the outage a configureration error by the admins or was it foul play?

radishingr • Nov 13, 2021 • View on HN

"Impact: fixed processes that led to 8 hr outage" seems like an easy case to make.

Redsquare • Jun 12, 2018 • View on HN

Bit drastic considering this was their first major global outage...

seldo • Jun 12, 2014 • View on HN

Our sincere apologies for tonight's downtime. We're back up now after 30 incredibly frustrating minutes, but we're making changes to ensure this incident can't be repeated.The root cause was a network failure at our CDN, Fastly. The incident was limited to a single Point of Presence (POP) in San Jose, so if you were in Europe or Asia you didn't see anything wrong, but obviously at this time of day most traffic is from the west coast.While our uptime over the last f

mattbillenstein • Jul 30, 2025 • View on HN

We're sorry https://www.youtube.com/watch?v=9u0EL_u4nvwEdit, an outage of this length smells of bad systems architecture...

sp8 • Mar 12, 2021 • View on HN

That literally happened, they blogged about it recently. https://www.backblaze.com/blog/recent-outages-why-we-acceler...

elil17 • Mar 19, 2018 • View on HN

Title seems misleading - a poorly chosen default behavior caused the outage

immjs • Jul 20, 2023 • View on HN

Companies that explain in great detail why an outage happened - chefskiss

Thorrez • Nov 1, 2021 • View on HN

From the blog post it sounds like no. They say a service got overloaded due to an increase in the number of datacenters and triggered a bug.