faultbox
Fault injection for distributed systems.
Intercept syscalls and protocol messages to test how your services behave
under failure.
api = service("api", binary="./api", http="localhost:8080")
db = service("db", binary="./db", tcp="localhost:5432")
def test_write_failure(t):
fault(db, write=deny("EIO"))
resp = api.http.post("/orders", json={"item": "widget"})
assert_eq(resp.status, 503, "API should return 503 when DB fails") $ faultbox test faultbox.star
PASS test_write_failure (0.42s)
✓ fault(db, write=deny("EIO"))
✓ POST /orders → 503
✓ assert_eq(resp.status, 503) Install
curl -fsSL https://faultbox.io/install.sh | sh Detects your platform, downloads the latest release, verifies checksum. Or build from source.
Why Faultbox
Syscall-level injection
Deny, delay, or hold any syscall via seccomp-notify. No eBPF, no
ptrace, no code changes. Faultbox automatically expands syscall
families — write covers write,
writev, pwrite64.
Protocol-level injection
Inject faults at HTTP, gRPC, Postgres, MySQL, Redis, Kafka, NATS, MongoDB, AMQP, and Memcached protocol level. Target specific queries, paths, or topics via transparent proxy.
Deterministic exploration
hold() and release() control syscall ordering
across services. --explore mode walks all interleavings
automatically. Seed-based replay for reproducible failures.
Starlark specs
Topology, faults, and assertions in one .star file. No
YAML. No separate config language. The spec is executable code.
Two modes
Run local binaries with binary= or real infrastructure
(Postgres, Redis, Kafka) in Docker containers with image=.
Event log & traces
Every intercepted syscall recorded with vector clocks. Temporal
assertions: assert_eventually(),
assert_never(), assert_within(). ShiViz
visualization support.
How it works
.star file Powered by seccomp-notify — no ptrace, no eBPF, no code instrumentation. Faults are injected in the kernel, invisible to the target process.
Supported protocols
Built for LLM agents
LLM agents write code. But who tests what happens when the database crashes, the network drops, or the disk fills up? Faultbox closes the loop.
Your LLM agent builds a microservice. It writes handlers, connects to Postgres, adds Redis caching.
One command from docker-compose. Every dependency gets fault scenarios — disk failures, network drops, slow queries.
faultbox init --from-compose JSON output with diagnostics: "write fault fired 3 times but service returned 200 — missing error handling in the persist path."
The agent reads the diagnostic, finds the code, adds error handling. Runs tests again. All pass. Commits with confidence.
faultbox init --claude creates slash commands and MCP config. Zero configuration.
Every LLM agent writing microservices needs to answer one question:
"What happens when things break?"
Faultbox is that answer.
LLM Integration Guide