On this page

Errno Reference

When injecting faults with deny(), you specify an errno — the error code the kernel returns to the target process. This reference lists the most useful errnos for fault injection testing, grouped by failure scenario.

Quick reference

ErrnoCodeMeaningCommon use
EIO5Input/output errorDisk corruption, hardware failure
ENOSPC28No space left on deviceDisk full
EROFS30Read-only file systemMounted read-only, immutable volume
ENOENT2No such file or directoryMissing file, deleted config
EACCES13Permission deniedWrong file permissions
EPERM1Operation not permittedMissing capability, security policy
ECONNREFUSED111Connection refusedService down, port not listening
ECONNRESET104Connection reset by peerRemote service crashed mid-request
ETIMEDOUT110Connection timed outNetwork unreachable, firewall drop
EHOSTUNREACH113No route to hostNetwork partition, DNS failure
ENETUNREACH101Network is unreachableInterface down, routing failure
EAGAIN11Resource temporarily unavailableSocket buffer full, non-blocking I/O
ENOMEM12Out of memoryMemory pressure, OOM conditions
EMFILE24Too many open filesFile descriptor exhaustion
ENFILE23Too many open files in systemSystem-wide fd limit
EEXIST17File existsLock file contention, create-exclusive
ENOTEMPTY39Directory not emptyCleanup failure
ENOSYS38Function not implementedMissing kernel feature, seccomp block

Note: Errno codes shown are for Linux (amd64/arm64). They’re the same across architectures for the common ones listed here.

Disk & storage failures

EIO — I/O error

fault(db, write=deny("EIO"), run=scenario)

Simulates: Disk corruption, bad sectors, SAN disconnection, NFS timeout. The most generic I/O error — the storage layer failed but doesn’t say why.

What to test:

  • Does the service retry or fail fast?
  • Is the error surfaced to the caller (not swallowed)?
  • Does partial write leave corrupted state?

ENOSPC — No space left on device

fault(db, write=deny("ENOSPC"), run=scenario)

Simulates: Disk full, volume quota exceeded, WAL growth beyond capacity. One of the most common production failures — logs or data fill the disk.

What to test:

  • Does the service return a meaningful error (not just “internal error”)?
  • Can the service still respond to healthchecks?
  • Does it stop accepting writes gracefully?

EROFS — Read-only file system

fault(db, write=deny("EROFS"), run=scenario)

Simulates: Filesystem remounted read-only after corruption detection, immutable container layers, read-only volume mount.

What to test:

  • Does the service distinguish “read-only” from “broken”?
  • Can it still serve read requests?

File access failures

ENOENT — No such file or directory

fault(db, openat=deny("ENOENT"), run=scenario)

Simulates: Missing config file, deleted data directory, unmounted volume, symlink target removed.

What to test:

  • Does the service fail with a clear error message naming the missing file?
  • Does it retry or fail immediately?

EACCES — Permission denied

fault(db, openat=deny("EACCES"), run=scenario)

Simulates: Wrong file ownership after deployment, restrictive SELinux/AppArmor policy, missing group membership.

What to test:

  • Does the error message mention permissions (not just “failed to open”)?
  • Can the service recover if permissions are fixed?

EPERM — Operation not permitted

fault(db, openat=deny("EPERM"), run=scenario)

Simulates: Missing Linux capability (e.g., CAP_NET_BIND_SERVICE), seccomp policy blocking the operation, mandatory access control denial.

EACCES vs EPERM: EACCES is “you don’t have permission for this specific resource.” EPERM is “you’re not allowed to do this operation at all.” In practice, many programs don’t distinguish them.

Network failures

ECONNREFUSED — Connection refused

fault(api, connect=deny("ECONNREFUSED"), run=scenario)

Simulates: Target service not running, port not listening, service crashed during deployment.

What to test:

  • Does the caller return 503 (not 500)?
  • Does it retry with backoff?
  • Does the error message name the target service?

ECONNRESET — Connection reset by peer

fault(api, read=deny("ECONNRESET"), run=scenario)

Simulates: Remote service crashed mid-response, load balancer killed the connection, TCP RST from firewall.

What to test:

  • Does the caller handle partial reads?
  • Does it retry the full request (idempotent) or fail?

ETIMEDOUT — Connection timed out

fault(api, connect=deny("ETIMEDOUT"), run=scenario)

Simulates: Firewall silently dropping packets (no RST), network congestion, DNS resolution timeout.

Tip: For testing timeout behavior, delay("5s") is often more realistic than deny("ETIMEDOUT"). A deny returns instantly — a real timeout makes the caller wait.

EHOSTUNREACH — No route to host

fault(api, connect=deny("EHOSTUNREACH"), run=scenario)

Simulates: Network partition, host down, routing table misconfiguration.

ENETUNREACH — Network is unreachable

fault(api, connect=deny("ENETUNREACH"), run=scenario)

Simulates: Interface down, default route missing, VPN disconnected.

Resource exhaustion

EAGAIN — Resource temporarily unavailable

fault(db, write=deny("EAGAIN"), run=scenario)

Simulates: Socket send buffer full, non-blocking I/O would block, file lock temporarily held by another process.

What to test:

  • Does the caller retry?
  • Is there a retry limit to prevent infinite loops?

ENOMEM — Out of memory

fault(db, write=deny("ENOMEM"), run=scenario)

Simulates: Memory pressure, mmap failure, large allocation rejected.

EMFILE — Too many open files

fault(db, openat=deny("EMFILE"), run=scenario)

Simulates: File descriptor exhaustion in the process. Common when connection pools or file handles leak.

What to test:

  • Does the service report fd exhaustion clearly?
  • Can it still handle healthcheck requests?

ENFILE — Too many open files in system

fault(db, openat=deny("ENFILE"), run=scenario)

Simulates: System-wide fd limit hit. Affects all processes on the host.

Data integrity

fsync failures

fault(db, fsync=deny("EIO"), run=scenario)

Simulates: Postgres fsync failure — data written to page cache but not persisted to disk. This is how real data loss happens: the write succeeds but the sync fails, and the application thinks data is durable.

What to test:

  • Does the database detect the sync failure?
  • Does it refuse to confirm the transaction?
  • Does it enter a crash-safe recovery state?

Critical: Postgres historically panicked on fsync failure because retrying might silently return success even though data was lost. This is exactly the kind of bug Faultbox was built to find.

Filesystem edge cases

EEXIST — File exists

fault(db, openat=deny("EEXIST"), run=scenario)

Simulates: Lock file already held by another process, create-exclusive (O_EXCL) failing because the file was already created, PID file from a previous crashed instance.

What to test:

  • Does the service handle “already exists” differently from “can’t create”?
  • Does it clean up stale lock files?

ENOTEMPTY — Directory not empty

fault(db, openat=deny("ENOTEMPTY"), run=scenario)

Simulates: Trying to remove a directory that still has files (e.g., cleanup of temp directories, log rotation removing old dirs).

ENOSYS — Function not implemented

fault(db, openat=deny("ENOSYS"), run=scenario)

Simulates: Running on a kernel that doesn’t support a specific syscall, seccomp policy blocking the operation entirely, missing filesystem feature.

What to test:

  • Does the service fall back to an alternative?
  • Does it report a clear “unsupported” error?

Using errnos not listed here

Linux has ~130 errnos. This reference covers the most common ones for fault injection. If you need an errno not listed here:

Step 1: Find the errno name. Run on your target Linux system:

# List all errnos:
python3 -c "import errno; print('\n'.join(f'{v}: {k}' for k,v in sorted(errno.errorcode.items())))"

# Or search for a specific error:
grep -r "EDEADLK\|ELOOP\|ENOLCK" /usr/include/asm-generic/errno*.h

Step 2: Use it directly in Faultbox — any valid Linux errno name works:

fault(db, write=deny("EDEADLK"), run=scenario)    # resource deadlock
fault(db, openat=deny("ELOOP"), run=scenario)      # too many symlinks
fault(db, write=deny("EDQUOT"), run=scenario)      # disk quota exceeded

Faultbox passes the errno string to the kernel — if Linux recognizes it, it works. No configuration needed.

Combining errnos with probability

Not all failures are 100%. Use probability for intermittent errors:

# 10% of writes fail — tests retry logic
fault(db, write=deny("EIO", probability="10%"), run=scenario)

# 50% connection failures — tests circuit breaker
fault(api, connect=deny("ECONNREFUSED", probability="50%"), run=scenario)

Combining errnos with delay

Real failures often start with slowness before errors:

# Slow then broken — cascade simulation
fault(db,
    write=delay("2s"),
    fsync=deny("EIO"),
    run=scenario,
)