Why Formal Verification Isn’t Just Fancy Exhaustive Testing

The coffee breaks at Systems Distributed ‘25 were insight-loaded and truly thought-provoking, and my background in formal methods and formal verification frequently sparked specific lines of discussion. One common question I kept encountering was:

Isn’t exhaustive deterministic testing just the same as formal verification?

It’s a good question, and it’s tempting to answer “yes”. After all, both aim for correctness and involve exploring states. However, exhaustively testing a real implementation — even with a deterministic simulator — is fundamentally different from writing and analyzing a formal model in a precise specification language like TLA+, Alloy, or P. Let’s unpack why.

❦ ❦ ❦

A Note on Terminology

When people compare exhaustive testing to formal methods, they typically refer to formal modeling in a specification language like TLA+, Alloy, or P, combined with model checking as a verification technique. While many other formal methods exist (such as theorem proving or type systems), this article focuses specifically on formal models and their verification.

We’ll explore their benefits, from technical gains to human factors, and also discuss important caveats to consider before incorporating them into your workflow. Let’s start by exploring those benefits.

1. A Design Artifact and Communication Tool

A formal model isn’t just a verification tool — it’s a second source of truth about your system. It often becomes the most precise expression of what the system is supposed to do (much more precise than natural language documentation In fact, we often find imprecise or inconsistent parts of the technical documentation when constructing a formal model from it. A formal specification language forces us to be extra precise, thus discrepancies automatically become apparent.), independently of what the code actually does. It serves as a communication artifact across teams, and often survives well beyond the code that implements it. The model, being small and focused, can serve as an artifact that multiple stakeholders (e.g., developers, architects, testers, product owners) can understand and agree on — think of it as executable documentation. Finally, as a code artifact, it enjoys all the benefits we like as engineers: version control, code review, continuous integration, and easy sharing.

2. Laser Focus and Exploration Speed

A formal model isolates core algorithmic complexity by stripping away irrelevant implementation machinery — network libraries, serialization code, database drivers — so you can focus on what really matters. In distributed systems, the interesting logic (like leader election, quorum selection, or recovery) is often buried within layers of code. Some paths of this core logic might be unreachable due to the way the higher system layers are currently implemented, or the way the test harness is set up — but they are lurking just out of sight, and may well become relevant with a later code change. By abstracting away from irrelevant implementation details, modeling lets you extract and verify just the core protocol.

In a similar way, modeling lets you ignore the “noise” of irrelevant implementation details, thereby significantly speeding up state exploration. As an extreme example, consider exhaustively testing a function taking a single uint128 argument. Even if testing just one of the ≈3.4028×10³⁸ that’s approx. 340 undecillion possible argument values only takes a nanosecond, it would still take over 10²² years to test them all. For context, a uint64 would still take ~584 years, illustrating the exponential growth of state space. Your great-great-great-great-great-great-grandchildren will be proud of you for trying, but this highlights why exhaustive testing is often impractical for large state spaces. Manually writing a minimal, abstract model is often the only way to exhaustively explore such expansive behaviors.

The former also applies to boilerplate implementation functionality like logging, metrics, network communication, or disk access, that increases the system state-space without affecting the correctness of the core protocol. “But hey, these are exactly the things that may break my system in an unforeseen way!!” you say? True! And that’s precisely why I don’t argue for using formal methods instead of testing. Rather, I advocate for using them both, or even better — together.

Finally, some advanced formal verification techniques let you explore classes of behaviors symbolically. For instance, you can reason about “any number of clients” without explicitly instantiating every combination. This is impossible to fully exercise with exhaustive testing, because there is always a “one larger” case you haven’t tested yet.

3. “Shift Left”: Explore Before You Code

You can start building and analyzing a model before the implementation exists to validate architectural assumptions early, even when pieces are still left abstract. This helps you catch design flaws before writing any production code — the cheapest time to fix them.

This is invaluable for early-stage design, where you’re still debating consistency models, failure semantics, or protocol steps. Because the spec lives outside the implementation language, compiler, runtime, and test framework, you can iterate quickly without committing to a specific tech stack and explore alternative designs at minimal cost, without the bleed-over that often happens in code, where one decision contaminates another and “changing your mind” might mean rewriting thousands of lines.

In later stages, the model becomes a useful long-term companion. It pays dividends during rewrites or migrations, when you can tweak the model to experiment with protocol changes or verify that new constraints still preserve key properties. Because the model is small, self-contained, and independent of infrastructure, it stays flexible — long after the first implementation is set in stone.

4. Non-Determinism & Atomicity

In a formal model, non-determinism is deliberate and visible. You can explicitly describe things like message reordering, process interleaving, node crashes, and retry races. For example, instead of writing complex test harnesses to trigger a specific message reordering scenario, a formal model allows you to simply state that messages can be reordered. The model checker then automatically explores all valid reorderings, systematically checking for issues. In contrast, these edge cases in real systems show up only occasionally — you have to deliberately cause or control them from your test harness.

In addition, precision about system states is hard to achieve in regular code: When is your system “in a state” where the property must hold? Is it really at the boundary of that function call invoked from your harness, or is it somewhere in the middle of function execution, when the change becomes externally visible? How do you exclude “invisible” intermediate states that shouldn’t be considered? And finally, how do you find these boundaries in a complex system with many independent components and layers of abstraction?

Specification languages, on the other hand, have clean, well-defined semantics. For example, TLA+ lets you declare actions that are atomic by construction, making it easy to separate “what must happen together” in one logical step, and the (externally) observable states at which the system’s properties are expected to hold. This precision can in turn help you write better code, because the system boundaries have already been clearly defined in the model.

❦ ❦ ❦

Caveats

Of course, formal modeling and verification aren’t free. Like all powerful tools, they have tradeoffs worth understanding. Here are some caveats that you should consider before adopting them in your workflow:

False sense of security: Model checking verifies your model — not your implementation. This is a critical distinction! However, you can significantly alleviate this risk by connecting the model with the implementation via model-based differential testing or test-case generation. This verifies that changes in the model are reflected and tested in the code, and vice-versa, significantly reducing the maintenance burden and closing the gap between spec and reality.
Learning curve: Your team already knows how to write code, and (exhaustive) testing is a natural extension of that. Formal specification languages have their own syntax and semantics, which can take time to learn. However, this is often a one-time investment that pays off in the long run.
Strengths & Tooling: Not all formal methods have mature tooling or integrations with existing development environments. This can make it harder to adopt them in practice. It’s also worth noting that different tools and specification languages excel in different application areas (e.g., TLA+ is particularly strong for distributed systems). I recommend thoroughly evaluating the available tools or getting expert guidance before committing to a specific technology.
Engineering overhead: Writing and maintaining a formal model requires effort, which may not always be justified for smaller projects or simpler systems. In larger systems, it usually pays off early in design and refactoring — but also in your day-to-day For example, AWS is very vocal about how formal methods have increased their confidence in critical changes. if you connect the model to the implementation (see above) — ideally in CI.
Verification overhead: Full-blown model checking can be slow, especially for large state spaces. However, many tools support additional alternative model exploration strategies — often inspired by testing — such as concrete or symbolic simulation.

❦ ❦ ❦

TL;DR

Formal verification (and model checking in particular) is not just exhaustive testing — it’s a fundamentally different activity. It’s about precisely defining what your system should do, then (optionally) exploring all the ways it could go wrong, at minimal cost.

Exhaustive testing is powerful, but combining it with formal modeling gives you something even better: precise understanding.