What are executable specifications?

Runnable models, not just Markdown docs.

The short version

A Markdown spec says what should happen. An executable spec can try things and check whether they still work.

That matters when bugs depend on ordering, retries, stale reads, timeouts, garbage collection, leader changes, or partial failure.

Some people call this specification-driven development. The important part is that the spec is runnable code, not prose or Markdown.

Example: a DNS balancer control plane

This example models a DNS load balancer control plane with garbage collection of stale plans. The full model has several actions; this snippet shows just one of them.

@action(inline=False)
def deployer_gc(c: Context[DnsBalancerState], deployer: Annotated[Expr, str]):
    """Deployer cleans up old plans."""
    s = c.state
    next_plan = s.deployer_next_plan[deployer]
    old_indices = s.planner_updates.keys.filter(
        lambda i: next_plan > i + s.MAX_PLAN_AGE
    )
    old_plan_entries = Set(
        s.planner_updates[i].creates for i in old_indices
    ).flattened

    # avoid unrealistic no-op cleanups
    c.assume(~(old_plan_entries & s.zone_records).is_empty)

    # implementation may do this in a loop;
    # the spec models the abstract effect
    s.zone_records -= old_plan_entries


@invariant
def no_inconsistent_root(s: DnsBalancerState):
    """Root must always point to an existing plan."""
    cnames = SetIf(r.kind == CNAME for r in s.zone_records)
    plans = Set(r.value[0] for r in SetIf(r2.name == ZONE for r2 in cnames))
    return ~((plans & Set(r.name for r in cnames)).is_empty) & (plans.size == 1)

From design intent to executable behavior

In executable specifications, we focus on the high-level effects: what changes in the system, not every API call or loop. The invariant at the bottom is now an executable check: after any sequence of actions, the root still points to a plan that exists. A Markdown sentence can describe that rule. An executable check can catch the regression.

One executable model, four uses

  1. 01

    Exploration

    Generate schedules that people would not write by hand, before the system reaches production.

  2. 02

    Test generation

    Generate scenarios and expected outcomes, then replay them against APIs, CLIs, simulators, or test harnesses.

  3. 03

    Review artifact

    Make atomicity, freshness, retries, rollback, and garbage-collection assumptions visible to the team.

  4. 04

    Living documentation

    Keep a runnable reference for behavior the team depends on, instead of a stale diagram no one trusts.

When executable specs are worth it

  1. 1

    Your system manages complex state. It implements a control loop, scheduler, protocol, database, controller, or reconciliation process.

  2. 2

    Correctness depends on timing or ordering. Retries, stale observations, partial failure, and delayed messages can change the outcome.

  3. 3

    Incidents are hard to reproduce. The failing path is rare, long, or depends on a schedule your tests do not cover.

  4. 4

    Important invariants are implicit. They live in docs, code comments, tribal knowledge, or review checklists instead of a runnable reference.

Want to learn this with your team?

Workshops are a good first step.

Focused pilots and implementation support are available when there is a concrete subsystem to model.

Discuss training or a pilot
  • Best fit: control planes, schedulers, Kubernetes controllers, payment systems, consensus protocols, databases, and stateful cloud infrastructure.
  • Remote or on-site. No prior formal-methods experience required.
  • A first call is 30 minutes, no preparation needed, and there is no commitment.