>Deterministic simulation. Another lightweight method widely used at AWS is deterministic simulation testing, in which a distributed system is executed on a single-threaded simulator with control over all sources of randomness, such as thread scheduling, timing, and message delivery order. Tests are then written for particular failure or success scenarios, such as the failure of a participant at a particular stage in a distributed protocol. The nondeterminism in the system is controlled by the test framework, allowing developers to specify orderings they believe are interesting (such as ones that have caused bugs in the past). The scheduler in the testing framework can also be extended for fuzzing of orderings or exploring all possible orderings to be tested.
This is fucking amazing
Loom for Rust does this. We also adapted it for some C++ with good success (found actual bugs that slipped tests and reviewes). Sits lower than i.e. TLA+ and is not a proof but super useful as you check the actual implementation.
DST was i̶n̶v̶e̶n̶t̶e̶d̶ popularized at FoundationDB a little over a decade ago, and has been quietly gathering steam ever since. If you’re interested in the technique, the FDB paper has some good info in section 4 and section 6.2: https://www.foundationdb.org/files/fdb-paper.pdf
(Disclosure: I am an author.)
I also gave a talk overview of it at Strange Loop back in 2015, but don’t have the youtube link handy.
If you’re interested in trying DST, we have a new company that aims to make it a much easier lift, especially for existing projects that can’t be retrofitted onto one of the many simulation frameworks: https://antithesis.com
Happy to answer any questions about the approach, here or over email.
Hey Will. I'm a huge fan of the work you all are doing, and of FoundationDB, but I don't believe it's accurate that DST was invented at FoundationDB (or, maybe it was, but was also used in other places around the same time or before).
For example, the first implementations of AWS's internal lock service (Alf) used DST as a key part of the testing strategy, sometime around 2009. Al Vermeulen was influential in introducing it at AWS, and I believe it built on some things he'd worked on before.
Still, Anithesis is super cool, and I really admire how you all are changing the conversation around systems correctness. So this is a minor point.
Also a huge proponent of Antithesis and their current work, but there definitely were some notable precedents at or around that time e.g. MODIST from 2009 (https://www.usenix.org/legacy/event/nsdi09/tech/full_papers/...), which similarly tried to build a "model checker for distributed systems".
As another interesting historical side note, I have wondered about the similarities between Antithesis and "Corensic", a startup spun out of UW around a similar time period (circa 2009). Apparently, they "built a hypervisor that could on-demand turn a guest operating system into deterministic mode" (see "Deterministic Multiprocessing" at https://homes.cs.washington.edu/~oskin/). My impression is that their product was not a significant commercial success, and the company was acquired by F5 Networks in 2012 (https://comotion.uw.edu/startups/corensic/).
Overall, I don't over-index on novelty, and think it is generally good for ideas to be recycled/revived/re-explored, with updated, modern perspectives. I believe that most rigorous systems designers/engineers likely converge to similar ideas (model checking, value of determinism, etc.) after dealing with these types of complex systems for long enough. But, it is nevertheless interesting to trace the historical developments.
Corensic was impressive tech. I actually debriefed with one of their founders years ago. IIRC, their product was focused on finding single-process concurrency bugs.
Deterministic hypervisors are by no means new. Somebody once told me that VMWare used to support a deterministic emulation mode (mostly used for internal debugging). Apparently they lost the capability some time ago.
Hi Marc, thank you for the correction! We started doing it around 2010, and were not aware of any prior art. But I am not surprised to hear that others had the idea before us. I will give Al credit in the future.
The TigerBeetle team does this too and it's an interesting approach. One of their developers did a talk on it at HYTRADBOI this year [0].
[0] https://www.hytradboi.com/2025/c222d11a-6f4d-4211-a243-f5b7f...
they also made a browser game to showcase their DST, where you can inject faults and see how tigerbeetle recovers
Here's their blog post from 2023 explaining what they did: https://tigerbeetle.com/blog/2023-07-11-we-put-a-distributed...
similar to the integration testing component in keploy.io, which includes time freezing.
basically certain system clocks are rolled back and incremented in a deterministic manner to emulate real issues recorded from production bugs. Had lot of fun working with building this and the effect of messing with different system clocks by intercepting system calls.
Also somewhat similar to what Antithesis is doing.
Which is largely an extended version of what FoundationDB does (from many of the same people).
Microsoft has a flavor of this for .NET called Coyote[0]
https://github.com/cmu-pasta/fray
Is a concurrency testing framework for Java. It also does deterministic simulation.
It ought to be the norm, and it is (slowly) getting there. For example, the .NET 8 standard library now has a TimeProvider type to allow test frameworks to provide substitute implementations of the system clock.
Guessing an ordering that will trigger a bug and exploring all orderings should both be basically impossible for non-trivial scenarios, no?
Perhaps, but there is still some value in "capturing the bug with a failing test." If some ordering presents a defect, you can recreate it and prove that you've fixed the bug.
I want something like this, but I work almost entirely in Go.
Am I correct in thinking that this would require a fork of the main Go implementation, in order to make a deterministic (and controllable) goroutine scheduler and network stack?
Does one also need to run on a particularly predictable OS?
For in-process go scheduling, some progress has been made here; see: https://go.dev/blog/synctest
But `synctest.Wait` won't work for non-durably blocked goroutines (such as in the middle of a TCP syscall) so requires memory-based implementations of e.g. net.Conn (I've plugged in https://pkg.go.dev/google.golang.org/grpc/test/bufconn with good success)
That's not enough for proof purposes. It allows you to build a test that deterministically tests one path, but it does not give you control over all possible tests.
In fact I seem to be a bit iconoclastic on this matter but I'm not even a fan for testing purposes. Even non-proof-based testing needs the ability to test that goroutines may execute out of order. Nothing prevents a goroutine scheduled for 2ms from now to run all the way to completion before a goroutine scheduled to run 1ms from now even starts, but AFAIK this approach doesn't let you test that case. Such logic is not valid in a multithreaded system; it is at most the most likely progression, not the only possible progression.
But since we live in the world where the question is more will anyone write a concurrency test at all, moaning about not having a perfect one is missing the point, I suppose. I won't deny a deterministic test is better than a non-deterministic test in general.
There may be more there than meets the eye at first glance.
I'm not saying this easy to do at the moment, but the possibility is there.
If you can take control of all timers/sleep in your Go program, then you have, in effect, complete control of Goroutine scheduling.
If you assign a distinct time point to each goroutine, say based on its ID, and then have the goroutine sleep until that point, then you have also assigned the order in which those goroutines will run. Each will "wake up" alone, only at the next point at which the clock is forced to advance--which is when the previous goroutine blocked.
"Am I correct in thinking that this would require a fork of the main Go implementation, in order to make a deterministic (and controllable) goroutine scheduler and network stack?"
Yes, because the Go runtime actually explicitly and deliberately leans in the opposite direction. Additional non-determinism is deliberately added to the system, most notably that a "select" call with multiple valid channels will be psuedorandomly chosen, and iteration on a map is somewhat scrambled on each iteration. It's not entirely random and as such not suitable for pretty much any randomness requirement you may have, but enough to generally prevent accidental dependencies on deterministic iteration.
This has the effect of "spreading out" concurrency issues and hopefully reducing the sort of issues you get when you only discover a major issue when something else in the environment changes three years later. It's a nice default but nothing like a proof, of course.
You can also go the "predictable OS" route, since the entire Go runtime sits on top of it. Nothing can non-deterministically run on top of a fully deterministic OS image. But a fully deterministic OS would probably want some integration with the Go runtime to be able to more deeply control it through mechanisms other than varying inputs to the random function, which is a very indirect way to go down a code path.
I worked on adding golang time freezing in keploy by using the approach from the go playground which used a fixed time (2009-11-10 23:00:00 UTC) but also increments the system clock used by goroutines.
This is what hermit does, although it's no longer actively developed.