SHAM is a framework for understanding system health by observing how much change occurs, how little breaks, and how long the system endures stress when nothing visibly fails.
It focuses on how safely software systems evolve over time, rather than on output, speed, or individual activity.
SHAM is a framework for observing the health of software systems by looking at how they change under real-world conditions. Instead of focusing on delivery metrics or visible incidents alone, SHAM examines how much of a system is touched, how it absorbs change, how often failures occur, how long they last, and the context in which the work happens.
The core idea is simple:
Healthy systems can change more while breaking less, and recover quickly when they do break.
Much of the most valuable work in software development is preventative. When architecture is sound, risks are addressed early, and processes are clear, the result is often that nothing obviously goes wrong. That success is real—but it is usually invisible. SHAM exists to make those patterns visible over time.
SHAM views systems through a small set of lenses:
- Change surface — how much of the system was touched
- Failures — what actually broke due to code
- Recovery — how long issues persisted
- Time — development time, exposure time, and recovery time
- Context — discovery level, design agency, constraints, and blockers
Individually, these signals are incomplete. Together, they describe how safely a system evolves.
SHAM is not designed to explain individual events. Its value emerges across many cycles of change.
Over time, it can surface patterns such as:
- Systems that tolerate large changes with minimal failure
- Work that appears small but creates outsized drag
- The effects of late development involvement
- How mandated design influences recovery and stability
- Where technical risk accumulates quietly
These patterns are often felt intuitively by teams long before they are measurable. SHAM provides a shared language for discussing them.
SHAM is:
- A framework, not a product
- A lens, not a scorecard
- A way to reason about system behavior, not people
- Compatible with existing engineering, SRE, and DevOps practices
- Designed to work with estimates and imperfect data
SHAM favors consistency and context over precision. Its purpose is to support clearer thinking and better conversations, not automation or enforcement.
This repository documents the concepts, metrics, and practical considerations behind SHAM, including:
- The philosophy behind measuring absence of failure
- Core concepts and definitions
- Formal metrics and interpretation guidance
- Implementation considerations and tradeoffs
- Examples drawn from real-world scenarios
- Common anti-patterns to avoid
You do not need to adopt SHAM in full for it to be useful. Many teams take individual concepts or language and adapt them to their environment.
SHAM is a long-view framework. Meaningful signals emerge over months, not days. Early use should focus on calibration and learning rather than conclusions.
Once patterns become visible, they are often difficult to unsee.
We welcome contributions to expand and improve the System Health & Architecture Monitor project. If you’re interested, please check out the contributing guidelines for more information.
- Author: William Shostak (https://github.com/wshostak)
This project is licensed under the ISC License — see the LICENSE file for more details.
Copyright (c) 2026 William Shostak