Syscall-Level Fault Injection

Intercept any syscall via Linux's seccomp-notify and decide — allow, deny, or delay. No eBPF, no ptrace, no code changes. Works on any binary: Go, Rust, Java, Python, C.

Syscall families

write automatically covers write, writev, pwrite64. Think in operations, not syscall numbers.

Path targeting

Fault only writes to /data/*.wal — stdout, TCP, and other writes are unaffected. Powered by fd→path resolution via /proc.

Probabilistic & triggered

deny("EIO", probability="30%") or deny("EIO", trigger="after=5") — intermittent failures and trigger-on-Nth-call.

Protocol-Level Fault Injection

Inject faults at the protocol level via transparent proxy. Target specific HTTP paths, SQL queries, Redis commands, or Kafka topics — without touching the network stack.

HTTP gRPC PostgreSQL MySQL Redis Kafka NATS MongoDB AMQP Memcached TCP
Unified API

fault(service) = syscall level. fault(service.interface) = protocol level. Same builtin, different dispatch.

Response rewriting

Return HTTP 503 for POST /orders, inject Postgres query errors, drop Kafka messages on specific topics.

Deterministic Exploration

Control syscall ordering across services with hold() and release(). Explore all possible interleavings automatically with --explore. Replay any failure with its seed.

Parallel execution

parallel(fn1, fn2) runs operations concurrently. Faultbox controls which syscall proceeds first.

Seed replay

Every test run has a seed. Failed? Replay with --seed 42 for identical interleaving — deterministic debugging.

Exhaustive mode

--explore=all tries every permutation (K! orderings). --explore=sample randomly samples for faster coverage.

Starlark Specs

Topology, faults, and assertions in one .star file. Starlark is a Python dialect — if you know Python, you know Starlark. No YAML, no separate config language. The spec is executable code.

Service declarations

service(), interface(), depends_on, healthcheck — declare topology as code.

Assertions

assert_eq(), assert_eventually(), assert_never(), assert_before() — value checks and temporal properties on the syscall trace.

Scenarios & generation

Register happy paths with scenario(), then faultbox generate creates failure tests automatically.

Event Log & Traces

Every intercepted syscall is recorded with vector clocks, service attribution, and file paths. Assert on internal behavior, not just inputs and outputs.

Temporal assertions

"The WAL write happened before the response" — assert_before() proves ordering guarantees.

ShiViz visualization

--shiviz trace.shiviz produces a space-time diagram with causal arrows between services.

Normalized traces

Capture before/after a refactor. faultbox diff shows exactly what behavioral changes you introduced.

Binary & Container Modes

Run local binaries for fast development, or real infrastructure in Docker containers for integration testing. Same spec, same assertions, same faults.

Binary mode

binary="./my-service" — fork+exec with seccomp filter. Fastest iteration, no Docker needed.

Container mode

image="postgres:16" — Docker containers with faultbox-shim entrypoint. Test against real Postgres, Redis, Kafka.

Monitors & Network Partitions

Define safety invariants as monitors that run on every syscall event. Simulate network partitions between specific services.

Monitors

Callbacks that fire on every matching event — fail immediately if an invariant is violated.

Partitions

partition(orders, inventory, run=scenario) — bidirectional network split, other connectivity intact.

Named Operations

Group related syscalls into logical operations. Fault "persist" instead of "write + fsync". Path filters target specific files.

Semantic faults

ops={"persist": op(syscalls=["write","fsync"], path="*.wal")} — fault the WAL persist operation, not individual syscalls.

LLM-First Design

New in v0.2.0

Faultbox is designed for both human engineers and LLM agents. Structured JSON output, MCP server, Claude Code integration — everything an agent needs for an autonomous code → test → fix loop.

MCP server

faultbox mcp — 6 tools for Claude, Cursor, and any MCP client. Run tests, generate specs, analyze failures natively.

Structured output

--format json — machine-parseable results with fault info, syscall summary, and actionable diagnostics.

Claude Code commands

faultbox init --claude — slash commands (/fault-test, /fault-generate, /fault-diagnose) and auto-MCP config.

From docker-compose

faultbox init --from-compose — zero-effort spec generation. Detects protocols, wires dependencies, generates happy-path tests.

Diagnostics

Not just "test failed" — structured hints like "write fault fired but service returned 200 — missing error handling in persist path."

Docker & CI

ghcr.io/faultbox/faultbox image + GitHub Action for automated fault testing on every PR.

Version History

VersionHighlights
v0.2.0 current LLM-first: --format json, MCP server, init --from-compose, init --claude, structured diagnostics, Docker image, GitHub Action
v0.1.0 Initial release: syscall & protocol fault injection, Starlark specs, deterministic exploration, containers, event log, scenarios, named operations, 10 protocol proxies