Chapter 6: From Tests to Domains
Duration: 20 minutes Prerequisites: Chapter 5 (Concurrency) completed
The problem with test-centric thinking
In chapters 2-5 you wrote tests like this:
def test_db_down():
def scenario():
resp = api.post(path="/data/key", body="value")
assert_eq(resp.status, 500)
fault(db, connect=deny("ECONNREFUSED"), run=scenario)
def test_disk_full():
def scenario():
resp = api.post(path="/data/key", body="value")
assert_true(resp.status >= 500)
fault(db, write=deny("ENOSPC"), run=scenario)
def test_slow_network():
def scenario():
resp = api.post(path="/data/key", body="value")
assert_eq(resp.status, 200)
assert_true(resp.duration_ms > 400)
fault(db, write=delay("500ms"), run=scenario)
This works. But look at what happened:
- The scenario is duplicated three times —
api.post(path="/data/key", body="value")appears in every test, copied and pasted. - The faults are inlined —
connect=deny("ECONNREFUSED")has no name. When you need “db down” in another test, you type it again. - The assertions are embedded — if you add a fourth fault, you copy-paste again.
With 5 scenarios and 4 fault modes, you have 20 hand-written test functions. With 10 scenarios and 8 fault modes, you have 80. The approach doesn’t scale.
The domain-centric model
Faultbox v0.3 separates testing into three independent layers:
┌─────────────────────────────────────────────────┐
│ Layer 1: WHAT THE SYSTEM DOES (scenarios) │
│ │
│ def order_flow(): │
│ return api.post(path="/orders", ...) │
│ │
│ def health_check(): │
│ return api.get(path="/health") │
├─────────────────────────────────────────────────┤
│ Layer 2: WHAT CAN GO WRONG (fault assumptions) │
│ │
│ db_down = fault_assumption("db_down", │
│ target=db, connect=deny("ECONNREFUSED")) │
│ │
│ disk_full = fault_assumption("disk_full", │
│ target=db, write=deny("ENOSPC")) │
├─────────────────────────────────────────────────┤
│ Layer 3: WHAT CORRECT MEANS (oracles) │
│ │
│ fault_matrix( │
│ scenarios=[order_flow, health_check], │
│ faults=[db_down, disk_full], │
│ overrides={ │
│ (order_flow, db_down): lambda r: ... │
│ }) │
└─────────────────────────────────────────────────┘
Each layer is defined once, reused everywhere:
- A scenario describes a user action. It doesn’t know about faults.
- A fault assumption describes a failure mode. It doesn’t know about scenarios.
- The matrix combines them. You define expected behavior where it matters.
5 scenarios + 4 faults = 9 definitions instead of 20 test functions.
Scenarios as probes
A scenario is a probe — it exercises the system and returns an observable result. No assertions inside.
BIN = "bin/linux"
db = service("db", BIN + "/mock-db",
interface("main", "tcp", 5432),
healthcheck = tcp("localhost:5432"),
)
api = service("api", BIN + "/mock-api",
interface("public", "http", 8080),
env = {"PORT": "8080", "DB_ADDR": db.main.addr},
depends_on = [db],
healthcheck = http("localhost:8080/health"),
)
def order_flow():
"""Place an order — return the response for external validation."""
api.post(path="/data/mykey", body="myvalue")
return api.get(path="/data/mykey")
scenario(order_flow)
def health_check():
"""Check API health — return the response."""
return api.get(path="/health")
scenario(health_check)
Why no assert_eq inside? Because the same scenario runs under different
faults with different expected outcomes:
- Under no fault:
status == 200 - Under
db_down:status >= 500 - Under
slow_network:status == 200butduration_ms > 400
The scenario doesn’t judge — it just reports what happened.
Named fault assumptions
Instead of typing connect=deny("ECONNREFUSED") everywhere, name it:
db_down = fault_assumption("db_down",
target = db,
connect = deny("ECONNREFUSED"),
)
disk_full = fault_assumption("disk_full",
target = db,
write = deny("ENOSPC"),
)
slow_network = fault_assumption("slow_network",
target = api,
connect = delay("500ms"),
)
A fault assumption is a reusable failure mode. It carries:
- A name (for human readability and matrix reports)
- A target service
- The syscall-level faults to apply
You can also attach monitors — invariants that must hold whenever this fault is active:
def check_no_db_traffic(event):
fail("traffic reached DB despite being down")
no_db_traffic = monitor(check_no_db_traffic, service="db", syscall="read")
db_down = fault_assumption("db_down",
target = db,
connect = deny("ECONNREFUSED"),
monitors = [no_db_traffic],
)
Now every test that uses db_down automatically verifies that no traffic
reaches the DB. You write the invariant once.
Fault scenarios — one scenario, one fault, one oracle
The simplest composition: pair one scenario with one fault assumption and
an expect oracle that validates the result:
fault_scenario("order_db_down",
scenario = order_flow,
faults = db_down,
expect = lambda r: assert_true(r.status >= 500, "should fail when DB is down"),
)
This registers test_order_db_down. When it runs:
- Installs
db_downfault rules (and its monitors) - Calls
order_flow(), captures the return value - Passes the return value to
expect— which asserts on it - Cleans up
No fault (happy path oracle):
fault_scenario("order_happy",
scenario = order_flow,
expect = lambda r: (
assert_eq(r.status, 200),
assert_eq(r.body, "myvalue"),
),
)
Without faults=, the scenario runs under normal conditions — useful for
validating the happy path with explicit expectations.
Smoke test (no oracle):
fault_scenario("order_disk_full_smoke",
scenario = order_flow,
faults = disk_full,
)
Without expect=, the test passes as long as the scenario completes
without crashing. Good for initial discovery — “does it survive this fault?”
Multiple faults simultaneously:
cascade = fault_assumption("cascade",
faults = [db_down, slow_network],
)
fault_scenario("order_cascade",
scenario = order_flow,
faults = cascade,
expect = lambda r: assert_true(r.status >= 500),
)
fault_scenario() is the right tool when you have one specific combination
to test. When you have many scenarios × many faults, use fault_matrix().
The fault matrix — the cross-product
When you have multiple scenarios and multiple fault assumptions, the matrix generates all combinations automatically:
fault_matrix(
scenarios = [order_flow, health_check],
faults = [db_down, disk_full, slow_network],
default_expect = lambda r: assert_true(r != None, "must return a response"),
overrides = {
(order_flow, db_down): lambda r: assert_true(r.status >= 500),
(order_flow, slow_network): lambda r: (
assert_eq(r.status, 200),
assert_true(r.duration_ms > 400),
),
(health_check, db_down): lambda r: assert_true(r.status >= 500),
},
)
This generates 6 tests (2 scenarios × 3 faults):
Fault Matrix: 2 scenarios × 3 faults = 6 cells
│ db_down │ disk_full │ slow_network
────────────────────┼───────────────┼───────────────┼──────────────
order_flow │ PASS (210ms) │ PASS (208ms) │ PASS (910ms)
health_check │ PASS (206ms) │ PASS (205ms) │ PASS (705ms)
Result: 6/6 passed
Cells without overrides use default_expect — a baseline check
(“must return something”). Cells with overrides use the specific oracle.
When to use which approach
The domain-centric model doesn’t replace the test-centric model — it builds on top of it:
| Approach | When to use | Example |
|---|---|---|
def test_*() with inline fault() | Learning, debugging one specific case | Chapters 2-5 |
fault_scenario() | One scenario + one fault + specific expected behavior | ”When DB is down, order returns 503” |
fault_scenario() (smoke) | Quick check: “does it survive this fault?” | No expect=, just no crash |
fault_matrix() | Systematic coverage: many scenarios × many faults | 5 scenarios × 4 faults = 20 tests |
faultbox generate | Discovery: let Faultbox propose failure modes | Auto-generates assumptions + matrix |
Most users start with fault_scenario() — it’s the workhorse for
individual fault tests. Graduate to fault_matrix() when you have
multiple scenarios and faults that should be cross-tested.
Composition — combining fault assumptions
Fault assumptions compose. Define simple ones and combine them:
db_down = fault_assumption("db_down",
target = db,
connect = deny("ECONNREFUSED"),
)
slow_network = fault_assumption("slow_network",
target = api,
connect = delay("500ms"),
)
# Compound failure: DB down AND slow network simultaneously.
cascade = fault_assumption("cascade",
faults = [db_down, slow_network],
description = "DB down + slow network",
)
Use cascade in a matrix or scenario just like any single assumption.
The full picture
Save this as domain-test.star:
BIN = "bin/linux"
db = service("db", BIN + "/mock-db",
interface("main", "tcp", 5432),
healthcheck = tcp("localhost:5432"),
)
api = service("api", BIN + "/mock-api",
interface("public", "http", 8080),
env = {"PORT": "8080", "DB_ADDR": db.main.addr},
depends_on = [db],
healthcheck = http("localhost:8080/health"),
)
# --- Layer 1: Scenarios (probes) ---
def order_flow():
api.post(path="/data/mykey", body="myvalue")
return api.get(path="/data/mykey")
scenario(order_flow)
def health_check():
return api.get(path="/health")
scenario(health_check)
# --- Layer 2: Fault Assumptions (failure modes) ---
db_down = fault_assumption("db_down",
target = db,
connect = deny("ECONNREFUSED"),
)
disk_full = fault_assumption("disk_full",
target = db,
write = deny("ENOSPC"),
)
# --- Layer 3: Matrix (cross-product) ---
fault_matrix(
scenarios = [order_flow, health_check],
faults = [db_down, disk_full],
)
Run it:
Linux:
faultbox test domain-test.star
macOS (Lima):
make lima-run CMD="faultbox test domain-test.star"
You should see:
--- PASS: test_health_check (207ms, seed=0) ---
--- PASS: test_matrix_health_check_db_down (206ms, seed=0) ---
--- PASS: test_matrix_health_check_disk_full (208ms, seed=0) ---
--- PASS: test_matrix_order_flow_db_down (210ms, seed=0) ---
--- PASS: test_matrix_order_flow_disk_full (210ms, seed=0) ---
--- PASS: test_order_flow (208ms, seed=0) ---
Fault Matrix: 2 scenarios × 2 faults = 4 cells
│ db_down │ disk_full
────────────────────┼───────────────┼──────────────
order_flow │ PASS (210ms) │ PASS (210ms)
health_check │ PASS (206ms) │ PASS (208ms)
Result: 4/4 passed
6 passed, 0 failed
4 matrix tests + 2 scenario tests = 6 tests from 2 scenarios and 2 fault assumptions. Add a third fault and you get 6 matrix tests automatically.
What you learned
- Test-centric works for small specs but duplicates scenario + fault + assertion
- Domain-centric separates WHAT (scenarios), WHAT BREAKS (assumptions), WHAT’S CORRECT (oracles)
scenario(fn)registers a probe — returns observables, no assertionsfault_assumption()names a reusable failure modefault_matrix()generates the cross-product- Monitors on assumptions enforce invariants across all tests
- Start with
def test_*(), graduate tofault_matrix()when specs grow
What’s next
From here forward, all tutorial examples use the domain-centric model. When you see a scenario, a fault assumption, or a matrix — that’s the standard approach for writing Faultbox specs.
Continue to:
- Part 3: Protocol-Level Faults — HTTP, database, broker faults
- Part 4: Safety & Verification — invariants, monitors, partitions
- Part 5: Advanced Features — containers, generation, event sources