Find bugs before production.
Avoid costly outages and midnight debugging.
I'm Dr. Thomas Pani. I help cloud-native, platform, infrastructure, and protocol teams build practical reliability workflows around executable specifications: runnable models, not just Markdown docs.
Find bugs before production.
Avoid costly outages and midnight debugging.
I'm Dr. Thomas Pani. I help cloud-native, platform, infrastructure, and protocol teams build practical reliability workflows around executable specifications: runnable models, not just Markdown docs.
Training & Workshops
I offer hands-on workshops and training for teams building distributed systems, stateful platforms, and cloud-native infrastructure. Recent examples below. For a custom workshop, get in touch.
DevConf 2026 · Brno, CZ
Find Distributed-Systems Bugs Before Production
Testing and Verification for Complex Systems
Better reliability usually does not come from "more tests". It comes from testing the right behavior.
Some people call this specification-driven development. I care about the executable part: the spec is runnable code, not Markdown.
What this looks like
Model behavior
Capture states, actions, failures, and invariants for a real subsystem.
Generate traces
Use simulation, fuzzing, or search to find paths people do not write by hand.
Check the system
Turn traces into model-based tests, conformance checks, reviews, or CI.
Ways to work together
Start with a workshop
Model a realistic control plane, scheduler, protocol, or reconciliation loop. Turn traces into tests.
Continue with a pilot
Pick one critical subsystem. Build an executable model, search for counterexamples, and connect it to the implementation.
Who This Is For
-
1
You own a stateful or distributed system where reliability matters. Retries, timeouts, reconciliation, partial failure, and stale observations can result in failures.
-
2
Your current tests miss rare behaviors. Example tests and E2E scenarios cover known paths, but not enough message orderings, failures, or long state-transition sequences.
-
3
Important invariants are implicit. The team knows what must never happen, but the test suite lacks an executable oracle for correct behavior.
-
4
You may be using AI-generated infrastructure or protocol code. If so, executable specs give humans and agents a runnable reference during generation, review, and CI.
Why Executable Specs?
Complex systems fail in behavior, not just inputs. Executable specs make that behavior runnable.
-
01
Rare schedules cause real outages
Production failures often happen on paths nobody wrote tests for: interleavings of messages, retries, timeouts, stale reads, and partial failures.
-
02
AI-generated code still needs an oracle
LLMs can produce plausible infrastructure and protocol code quickly. They also miss interleavings, race conditions, atomicity assumptions, and invariants.
Executable specs turn intended behavior into runnable code. They generate paths, act as test oracles, and give humans and agents a reference during generation, review, and CI. Learn what executable specs are.
How this fits with other testing Show details
Executable specs are not a replacement for the rest of your testing stack. They strengthen it where the missing piece is a behavioral oracle and early exploration of rare schedules.
Hypothesis / PBT / Fuzzing
Excellent input and sequence generation. Specs provide stronger expected behavior than "does not crash".
Jepsen
Finds real failures in running distributed systems. A spec tells you what should have happened.
Antithesis / Deterministic Simulation
Improves reproducibility under faults and schedules. Specs provide the missing correctness oracle.
TLA+, Quint, FizzBee
Good spec-first options. The right choice depends on whether you need design modeling, model checking, MBT, proofs, or implementation-facing tests.
Customer Spotlights
Recent examples of executable specs, model checking, protocol fuzzing, and reliability work.
Aztec Labs: Governance Protocol
2025Formal specification and symbolic verification of 125 invariants across a multi-contract governance system.
Ethereum Foundation: 3-Slot Finality
2024Formal modeling and verification of accountability in Ethereum 3SF consensus.
Open Source
Apalache: Symbolic Model Checker for TLA+ and Quint
CurrentCore team member; used on Aztec governance, Ethereum 3SF, Tendermint/CometBFT, and Matter Labs/ZKsync protocols.
Quint: Modern Language Tooling for TLA+
PastPreviously co-developed Quint from its early stages at Informal Systems, bringing TypeScript-like syntax to TLA+-style specs.
Get in Touch Limited Availability
Want to try this with your team?
Workshops are a good first step. Focused pilots and implementation support are available when there is a concrete subsystem to model.
Discuss training or a pilot- Best fit: control planes, schedulers, Kubernetes controllers, payment systems, consensus protocols, databases, and stateful cloud infrastructure.
- A first call is 30 minutes, no preparation needed, and there is no commitment.
- I typically respond within one business day.