5-Minute Quickstart

Edit on GitHub

Choose a Blackbox quickstart track: add effect coverage to an existing system test, or learn the first proof loop from the showcase.

Choose Your Starting Point

Blackbox has two fast starts. Use ?track=existing when you already have a system or E2E test. Use ?track=new when you want to learn from the showcase before wiring your own service.

Both tracks start with runtime behavior. An effect is observable boundary behavior: a database write, queue message, HTTP call, cache change, emitted event, email intent, or forbidden dependency call.

By the end, you should have one flow run with Blackbox, one effect catalog, one reviewed required or forbidden effect, and one effect coverage signal in the terminal.

Before you start

Install Blackbox first. Then choose the path that matches your repo: an existing system or E2E test, or the Blackbox showcase.

First success

One run records effects, one catalog entry is reviewed, and the next run reports a satisfied effect coverage entry.

Track 1 I already have system or E2E tests Add Blackbox to one flow, review its effect catalog, and rerun until one catalog entry is covered. Track 2 Showcase first Run the Blackbox showcase, inspect the proof trail, then copy the smallest system-test shape.

Track 1: Add Effect Coverage To An Existing System Test

Use this path when you already have a system or E2E test that exercises a valuable flow. The first win is not a new suite. It is evidence for behavior your existing test already drives.

Existing tests already create useful traffic. Blackbox turns that traffic into observed effects, a reviewed effect catalog, and effect coverage on the next run.

1Add the Blackbox wrapper and matcher

Keep the request and response assertions you already trust. Add the Blackbox system boundary and `toMatchCatalog()` at the end of one valuable flow.

import { expect, test } from './testbed.js';

test.system('checkout-flow', 'customer completes checkout', () => {
  test('existing checkout test', async ({ capture, request, system }) => {
    const response = await request.post(`${system.app.hostBaseUrl}/checkout`, {
      data: { cartId: 'cart-123', paymentMethodId: 'pm_card_visa' },
    });

    expect(response.status()).toBe(201);

    await expect(capture).toMatchCatalog();
  });
});

2Run your normal system-test command

Blackbox does not need to replace your runner. Use the command your project already uses; the Playwright command below is only an example.

<your-system-test-command>

npx playwright test --config ./e2e/playwright.config.ts

You should see a baseline-pending message the first time the catalog entry does not exist.

toMatchCatalog: no catalog entry for "checkout-flow"; baseline pending.
Will be written at fixture teardown.

3Review the effect catalog

The first useful artifact is the generated catalog. It starts as observed behavior, not a trusted contract. Review it before committing it.

specVersion: "0.1"

checkout-flow:
  requires:
    - { boundary: postgres, op: INSERT, key: orders }
    - { boundary: http, op: POST, key: /payments }
    - { boundary: sqs, op: SendMessage }

4Tighten the catalog to one meaningful contract

`toMatchCatalog()` is already the matcher. Your job here is to edit the catalog: keep one required effect that proves useful behavior and add one forbid for behavior that must not happen.

specVersion: "0.1"

checkout-flow:
  requires:
    - { boundary: postgres, op: INSERT, key: orders }
  forbids:
    - { boundary: http, op: POST, key: /refunds }

5Rerun and read effect coverage

On the next run, the terminal should show whether the reviewed catalog entry was satisfied, failed, or uncovered.

effect coverage
metric           value
catalog entries  1
satisfied        1  (100%)
failed           0
uncovered        0

  entry          state      runs  why
✓ checkout-flow  satisfied  1/1
written: <coverage-dir>/coverage.json

That is the first quickstart win: the same flow now proves something about the behavior inside the system, not only its response.

Track 2: Showcase First

Use this path when you do not yet have a useful system test. First learn the proof loop in the Blackbox showcase, then write one narrow flow in your own system.

When no useful system tests exist, the showcase is the fastest way to understand the effect proof loop before adding one narrow flow to your own service.

1Run the showcase

From the Blackbox showcase repo root, run the system-test script. The subscription flow is intentionally small but effect-rich: Redis, Postgres, HTTP, and SQS all participate.

pnpm test:system

2Inspect the proof trail

A successful showcase run should produce a catalog and coverage artifacts. Start with these three; spans and V8 payloads are diagnostics for later.

e2e/tests/__effects__/00-baseline-subscribe.system.effects.yaml
e2e/.blackbox-coverage/coverage.json
e2e/.blackbox-coverage/omcdc-propagation.md

3Copy the smallest test shape

The important shape is `test.system(...)` plus `toMatchCatalog()`. The matcher seeds a baseline when no catalog entry exists, then enforces the reviewed catalog on later runs.

import { expect, test } from './testbed.js';

test.system('subscribe-flow', 'subscribing a user to the pro tier', () => {
  test('alice subscribes', async ({ capture, request, system }) => {
    const response = await request.post(`${system.bff.hostBaseUrl}/subscriptions`, {
      data: { userId: 'alice', paymentMethodId: 'pm_card_visa' },
    });

    expect(response.status()).toBe(201);

    await expect(capture).toMatchCatalog();
  });
});

4Write one narrow flow in your system

Pick a flow where input/output success is not enough: checkout, signup, webhook handling, account deletion, refund prevention, or another side-effect-heavy path.

The goal is not a broad journey. The goal is one controlled system flow whose effects matter.

5Review the catalog and read coverage

The first custom catalog begins as observed behavior. Keep the effects that define correctness, add forbids for dangerous behavior, then rerun to see effect coverage.

specVersion: "0.1"

subscribe-flow:
  requires:
    - { boundary: redis, op: GET, key: "user:*:tier" }
    - { boundary: postgres, op: INSERT, key: subscriptions }
    - { boundary: sqs, op: SendMessage }
  forbids:
    - { boundary: postgres, op: DELETE }
    - { boundary: http, op: POST, key: /v1/refunds }

effect coverage
metric           value
catalog entries  1
satisfied        1  (100%)
failed           0
uncovered        0

  entry          state      runs  why
✓ subscribe-flow  satisfied  1/1
written: <coverage-dir>/coverage.json

After First Success

Inspect OMC/DC when you need decision evidence

OMC/DC is advanced coverage over decisions and observed effects. The normal test run writes the reports; replay is for offline rendering from saved coverage artifacts.

pnpm exec blackbox coverage replay --coverage-dir e2e/.blackbox-coverage --out reports

e2e/.blackbox-coverage/omcdc-propagation.md
e2e/.blackbox-coverage/omcdc-propagation.json
e2e/.blackbox-coverage/omcdc.html

Add behavior specs when readable specs help

Feature files are optional. Generate them when you want a readable behavior surface and feature-file drift checks beside runtime effect evidence.

pnpm exec blackbox features emit ./e2e/tests --out ./e2e/features
pnpm exec blackbox features check --features ./e2e/features --tests ./e2e/tests

Continue to Feature Files From Tests.

After Your First Successful Run

After the first effect loop works, choose the next layer deliberately:

Continue to Feature Files From Tests if you want feature files, Gherkin output, or feature-file drift checks.
Read System Effects to understand catalogs, matchers, effect coverage, and OMC/DC.
Use Troubleshooting First Run if the catalog, runner, or coverage output is missing.