Tomasz Tunguz Blog · 2021-01-26 · 1955d

Chaos Engineering: The Path to Higher Uptime and Faster Issue Resolution

Chaos engineering has become the dominant practice for improving software reliability in modern cloud architectures. Organizations that regularly conduct chaos engineering tests achieve significantly higher uptime rates (82-92%) and faster mean-time-to-resolution (MTTR) compared to those not using the practice. Major companies like Amazon, Netflix, Expedia, and Walmart have adopted chaos engineering as a critical tool following their adoption of Kubernetes and microservices.

6 metrics· Cited 0× in the knowledge base ·Open source ↗

Metrics in this report

Mean-Time-To-Resolution

< 1 daytime

median

companies running chaos engineering

Mean-Time-To-Resolution

< 30%

percent achieving < 1 day resolution

companies not using chaos engineering

Target Uptime

99%

industry standard

SLA compliance

Uptime Achievement Rate

82%

percent achieving 99% uptime

teams running chaos regularly

Uptime Achievement Rate

87%

percent achieving 99% uptime

teams running chaos twice per month

Uptime Achievement Rate

92%

percent achieving 99% uptime

teams running chaos daily