Resilient Systems Through Retrospection

Jun 30, 2023

Every breakage is an opportunity to make our systems more resilient. After leading hundreds of incident reviews, I realize that each review can be reduced down to three simple questions. In this post I’ll go over why these questions are exhaustive and what common tactics come from answering these questions.

Read →

7 Comments

Patrick

Jul 3, 2023

Good points! How do you approach issues you can't easily trace? Having retries on a function level and maybe even on a worker level increases the resilience of an app/service quite well but that does not fully work if you run out of memory for example. It's often the case that you don't even see an error log because the service crashed. That's something I found very tricky in the past.

Expand full comment

Reply (1)

Ryan Peterman

Jul 4, 2023

> It's often the case that you don't even see an error log because the service crashed

Logging to an external service works well for auditing what happened. That way, even if your main service crashes, you can still query and analyze the logs to see what happened up until the service stopped responding.

Expand full comment

Jordan Cutler

Jul 1, 2023

Awesome article, Ryan. I like the breakdown into the 3 questions. It’s super helpful to think about it how you laid it out

Expand full comment

Reply (1)

Ryan Peterman

Jul 2, 2023

Thank you Jordan, glad you liked it!

Expand full comment

Ivan H

Jun 30, 2023

When trying to solve the problem of bugs if you cannot fix the bug, take a break. This will give you a better state of mind.

Expand full comment

Danilo Tedeschi

Mar 6, 2024

Awesome article! I would love to hear more about ways to protect the release process such as using canaries or other tactics to prevent incidents from happening.

Expand full comment

Reply (1)

Ryan Peterman

Mar 6, 2024

Thank you Danilo, glad you liked it. That's a good idea, I'll add it to my notepad for a future article :)

Expand full comment

The Developing Dev

Resilient Systems Through Retrospection