Features
v0.2.0 · Apache 2.0 · Linux (macOS via Lima)
Syscall-Level Fault Injection
Intercept any syscall via Linux's seccomp-notify and decide — allow, deny, or delay. No eBPF, no ptrace, no code changes. Works on any binary: Go, Rust, Java, Python, C.
write automatically covers write, writev,
pwrite64. Think in operations, not syscall numbers.
Fault only writes to /data/*.wal — stdout, TCP, and other writes
are unaffected. Powered by fd→path resolution via /proc.
deny("EIO", probability="30%") or deny("EIO", trigger="after=5") —
intermittent failures and trigger-on-Nth-call.
Protocol-Level Fault Injection
Inject faults at the protocol level via transparent proxy. Target specific HTTP paths, SQL queries, Redis commands, or Kafka topics — without touching the network stack.
fault(service) = syscall level. fault(service.interface) = protocol
level. Same builtin, different dispatch.
Return HTTP 503 for POST /orders, inject Postgres query errors,
drop Kafka messages on specific topics.
Deterministic Exploration
Control syscall ordering across services with hold() and
release(). Explore all possible interleavings automatically
with --explore. Replay any failure with its seed.
parallel(fn1, fn2) runs operations concurrently.
Faultbox controls which syscall proceeds first.
Every test run has a seed. Failed? Replay with --seed 42 for
identical interleaving — deterministic debugging.
--explore=all tries every permutation (K! orderings).
--explore=sample randomly samples for faster coverage.
Starlark Specs
Topology, faults, and assertions in one .star file.
Starlark is a Python dialect — if you know Python, you know Starlark.
No YAML, no separate config language. The spec is executable code.
service(), interface(), depends_on,
healthcheck — declare topology as code.
assert_eq(), assert_eventually(),
assert_never(), assert_before() — value checks
and temporal properties on the syscall trace.
Register happy paths with scenario(), then
faultbox generate creates failure tests automatically.
Event Log & Traces
Every intercepted syscall is recorded with vector clocks, service attribution, and file paths. Assert on internal behavior, not just inputs and outputs.
"The WAL write happened before the response" —
assert_before() proves ordering guarantees.
--shiviz trace.shiviz produces a space-time diagram
with causal arrows between services.
Capture before/after a refactor. faultbox diff shows
exactly what behavioral changes you introduced.
Binary & Container Modes
Run local binaries for fast development, or real infrastructure in Docker containers for integration testing. Same spec, same assertions, same faults.
binary="./my-service" — fork+exec with seccomp filter.
Fastest iteration, no Docker needed.
image="postgres:16" — Docker containers with faultbox-shim
entrypoint. Test against real Postgres, Redis, Kafka.
Monitors & Network Partitions
Define safety invariants as monitors that run on every syscall event. Simulate network partitions between specific services.
Callbacks that fire on every matching event — fail immediately if an invariant is violated.
partition(orders, inventory, run=scenario) — bidirectional
network split, other connectivity intact.
Named Operations
Group related syscalls into logical operations. Fault "persist" instead of "write + fsync". Path filters target specific files.
ops={"persist": op(syscalls=["write","fsync"], path="*.wal")} —
fault the WAL persist operation, not individual syscalls.
LLM-First Design
New in v0.2.0
Faultbox is designed for both human engineers and LLM agents. Structured JSON output, MCP server, Claude Code integration — everything an agent needs for an autonomous code → test → fix loop.
faultbox mcp — 6 tools for Claude, Cursor, and any MCP client.
Run tests, generate specs, analyze failures natively.
--format json — machine-parseable results with fault info,
syscall summary, and actionable diagnostics.
faultbox init --claude — slash commands (/fault-test,
/fault-generate, /fault-diagnose) and auto-MCP config.
faultbox init --from-compose — zero-effort spec generation.
Detects protocols, wires dependencies, generates happy-path tests.
Not just "test failed" — structured hints like "write fault fired but service returned 200 — missing error handling in persist path."
ghcr.io/faultbox/faultbox image + GitHub Action for
automated fault testing on every PR.
Version History
| Version | Highlights |
|---|---|
| v0.2.0 current | LLM-first: --format json, MCP server, init --from-compose,
init --claude, structured diagnostics, Docker image, GitHub Action |
| v0.1.0 | Initial release: syscall & protocol fault injection, Starlark specs, deterministic exploration, containers, event log, scenarios, named operations, 10 protocol proxies |