Hi, I’m Erika Rowland (a.k.a. erikareads). Hi, I’m Erika. I’m an Ops-shaped Software Engineer, Toolmaker, and Resilience Engineering fan. I like Elixir and Gleam, Reading, and Design. She/Her. ← Constellation Webring → Published on October 06, 2023

Paper: Four Concepts for Resilience Engineering

Today’s paper is by David Woods: Four concepts for resilience and the implications for the future of resilience engineering. I found this paper through Fred Hebert’s excellent summary.

Resilience has been used a number of different ways in Resilience Engineering literature. This conflation of different definitions makes it difficult to parse and understand what that literature is arguing for and why.

Woods’ paper attempts to codify these various definitions into four categories. The categories clarify which properties of a system we’re talking about, and which aspects of a system are worth studying.

Resilience as Rebound

Rebound is about returning to normal functioning after a disruption.

This ability to rebound seems to depend directly on the conditions before the disrupting event. How was the system prepared before the chaos began?

While literature on rebound tends to focus on individual disruptions or traumas, the more interesting thing to study is the idea of surprises. A surprise is, to quote Woods:

the event is a surprise when it falls outside the scope of variations and disturbances that the system in question is capable of handling

A surprise is an event that challenges an existing model and forces the system to learn or adapt the model.

Resilience as Robustness

Robustness is the ability of a system to resist disruptions. However, robustness is specifically defined as resistance against a known set of disruptions. From Rebound, we know that a system will often deal with surprises, that by definition, fall outside the robustness a model.

Thus resilience is more interested in adapting to surprise than preparing against known threats. However, this distinction is lost when conflating resilience with robustness.

Another problem is that expanding robustness in one way, often opens additional vectors of failure in the event of surprises. For example, an umbrella gives you some shelter from rain, but the mechanism can jam or strong winds can drag you into unwanted positions.

Resilience as Graceful Extensibility

Graceful Extensibility sees resilience as the opposite of brittleness. Where brittleness is how a system performs when it’s pushed to near or beyond its limits.

This is a definition of resilience that specifically cares about the kinds of surprise that I talked about in Rebound. While surprise falls outside of our systems limits, it has regular characteristics because many classes of challenge re-occur.

Graceful extensibility is play with the idea of graceful degradation. Graceful extensibility suggests the ability to grow from surprises which stands in contrast to robustness or graceful degradation, which only allow us to mitigate breakdowns.

Another aspect of concern is decompensation where exhaustion of a system under sustained disruption reduces the capacity of the system to adapt to new disruptions. Think on-call fatigue in an operations teams or deformation of a material under stress that changes its properties.An effective way to “cut” paper without scissors is to fold it back and forth across a joint until the paper gives way with little force. Allowing you to tear the paper straight by hand. As Woods writes:

When the time to recovery increases and/or the level recovered to decreases, this pattern indicates that a system is exhausting its ability to handle growing or repeated challenges, in other words, the system is nearing saturation of its range of adaptive behavior.

Resilience as Sustained Adaptability

Sustained adaptability asks three questions of a resilience engineer:

What governance or architectural characteristics explain the difference between systems that have sustained adapt- ability that don’t?
What design principles and techniques allow you to engineer sustained adaptability?
How would you know if you succeeded in engineering sustained adaptability in a system?

Predictable challenges that test a system include:

Surprises will continue to challenge boundaries of a system.
Conditions will change over time, shifting the boundaries of a system.
When, not if, the system fails to adapt, people will need to bear that burden.
How the system needs to gracefully extend, and what allows you to gracefully extend will change over time.
The system will need to be able to benefit from surprises, as discussed in graceful extensibility.

Architecting a system to have sustained adaptability relies on understanding that all adaptive systems are constrained by trade-offs, and that certain architectures allow for adjustment of those trade-offs.

Woods argues that definitions 1 and 2 have led to fewer developments and are less useful overall than definitions 3 and 4. Though he exercises caution in his judgment given the youth of the field.Thanks to Jeff for reading and editing with me for these notes.

← Constellation Webring →