Feature Files From Tests

Generate Gherkin .feature files from system and E2E tests, then check feature-file drift without making BDD mandatory.

Feature files are a readable behavior layer on top of tests that already exist.

Use this page after you have at least one system or E2E test. You can skip this layer and still use Blackbox for runtime evidence, effect catalogs, matchers, effect coverage, and OMC/DC. Add feature files when the team wants a Gherkin-shaped artifact for review, product discussion, SDD handoff, or CI checks for feature-file drift.

Behavior specs after effectsFeature files can be skipped, emitted from tests, or combined with effects after a system test exists.Behavior specs are optional once a system test existsSkip them, emit feature files from tests, or combine readable specs with runtime effects.1system testrequired input2emit featureor skip3drift checkkeep specs current
Feature files do not require effect coverage. They need a test source. Effects add runtime proof when the team wants the stronger path.
Test to Feature REPL

Future package-backed REPL flow: load a test, analyze the AAA/Given-When-Then shape, emit Gherkin, then run the gates.

Source Test

The input can be a plain Playwright-style system test or a Blackbox BDD-DSL test. Plain tests are decompiled best-effort; DSL-authored tests preserve more intent.

test.system('subscribe-flow', 'alice subscribes to the pro tier', () => {
test('alice is an existing user with no active subscription', async ({ request, system }) => {
const response = await request.post(`${system.bff.hostBaseUrl}/subscriptions`, {
data: { userId: 'alice', paymentMethodId: 'pm_card_visa' },
});
expect(response.status()).toBe(201);
});
});

The BDD Context

BDD and Cucumber made an important idea popular: behavior should be readable by more than the person who wrote the test. Gherkin gave teams the Given, When, Then shape and .feature files gave behavior a reviewable home.

The hard part was maintenance. In many teams, hand-written feature files, step definitions, and test code became three things that had to agree. When the system changed, the prose could stay green-looking while the real behavior moved somewhere else.

Blackbox uses the useful part of that movement without requiring the old workflow:

WorkflowStarts fromWhat you maintainBest fit
Cucumber-style BDDHand-written .feature filesFeature text plus step bindingsTeams that want Gherkin to drive implementation
Plain system testsTest codeTest code onlyEngineering-only suites
Blackbox feature filesExisting system or E2E testsTest source plus generated/reviewed feature filesTeams that want readable specs without making Gherkin the driver

Gherkin is the format. Cucumber is not required.

What Blackbox Actually Does

Blackbox does not just stringify test names into .feature files. The feature pipeline has four jobs:

  1. Detect the test style: explicit Blackbox BDD DSL when present, otherwise Playwright-style tests.
  2. Analyze the behavior shape into phases: arrange as Given, act as When, conclude with observable Then assertions.
  3. Emit a canonical Gherkin subset: @flow, Feature, optional Background, Scenario, Scenario Outline, Rule, Examples, When, Then, and And.
  4. Gate the result with syntax validation, feature-file drift detection, and optional runtime-observation comparison.

The explicit BDD DSL gives Blackbox the strongest signal because the author already wrote scenario, given, when, then, background, or given.each. Plain Playwright tests can still be decompiled, but intent recovery is best-effort. If the test structure is unclear, lint findings are useful feedback rather than something to hide.

Prerequisite

Start with a test source that describes a real flow:

tests/
checkout.system.test.ts

or, in the showcase layout:

e2e/tests/
00-baseline-subscribe.system.test.ts

The test should be worth explaining to another human. Feature files are not a replacement for writing the test or proving runtime behavior.

Choose The Layer

There are three valid paths:

PathWhat you getWhat you do not get
Effects onlyRuntime evidence, effect catalogs, matchers, effect coverage, and OMC/DC reportsHuman-readable Gherkin output
Feature files onlyReadable .feature files and feature-file drift checks from test sourceRuntime effect coverage
BothRuntime-backed behavior proof plus readable feature filesMore artifacts to review

Use the smallest path that creates value. If the audience is mostly developers and the effect catalog is already clear, effects-only may be enough. If product, QA, platform, or AI-assisted workflows need a readable behavior surface, add feature files.

Generate A Feature File

Use features emit to create .feature files from Playwright or Blackbox BDD-DSL test sources:

Terminal window
pnpm exec blackbox features emit ./tests --out ./features

For the showcase layout:

Terminal window
pnpm exec blackbox features emit ./e2e/tests --out ./e2e/features

The command accepts one or more test files or directories. It can also use a step library, or skip step resolution:

Terminal window
pnpm exec blackbox features emit ./tests --steps ./scripts/step-lib.json
pnpm exec blackbox features emit ./tests --no-steps

The output is a Gherkin .feature file:

@flow:subscribe-flow
Feature: subscribing as a pro tier user
Scenario: alice is an existing user with no active subscription
When alice POSTs /subscriptions with a valid card
Then the response status is 201
And the subscription is persisted in postgres
And the tier is cached in redis
And a subscribe order is queued in SQS

Read this file as a review artifact derived from tests, not as proof by itself. Runtime proof comes from the system run, the effect catalog, and coverage reports.

Check The Behavior Grammar

Use features lint when feature files matter enough to gate. The key rule is aaa-shape:

Given* When+ Then+

That means a scenario may have setup, must exercise at least one action, and must conclude with at least one observable assertion. The linter also catches missing Then steps, empty scenarios, unresolved placeholders, overly opaque decompiled steps, and repeated setup that should become a Background.

Terminal window
pnpm exec blackbox features lint ./tests --fail-on error
pnpm exec blackbox features lint ./tests --json

Use this before trusting generated feature files in review. A pretty feature file is not enough; the underlying scenario needs a shape that can survive as a verification artifact.

Check Gherkin Syntax And Feature-File Drift

Use features check after generating feature files. It runs two checks by default:

  1. Gherkin syntax validation for .feature files with the Cucumber parser.
  2. Source-vs-feature consistency between the test sources and generated feature files.
Terminal window
pnpm exec blackbox features check --features ./features --tests ./tests

For the showcase layout:

Terminal window
pnpm exec blackbox features check --features ./e2e/features --tests ./e2e/tests

Use JSON when a script, CI job, or AI agent needs structured output:

Terminal window
pnpm exec blackbox features check --features ./e2e/features --tests ./e2e/tests --json

You can also run only one side of the check:

Terminal window
pnpm exec blackbox features check --features ./features --skip-drift
pnpm exec blackbox features check --features ./features --tests ./tests --skip-syntax

For a dedicated feature-file drift check:

Terminal window
pnpm exec blackbox features drift --tests ./tests --features ./features

The feature-file drift kinds are missing, stale, orphan, and unparseable.

Optional Semantic Observation Gate

Feature-file drift is source-to-feature drift. It answers: did the .feature file stay synchronized with the test source?

Semantic observation comparison is runtime-to-runtime drift. It answers: did two runs produce the same meaningful observable behavior?

Use it when reshaping tests, migrating toward the BDD DSL, or comparing a baseline run to a candidate run:

Terminal window
pnpm exec blackbox features compare-observations --baseline ./baseline-observations --candidate ./candidate-observations --json

This gate is experimental. It compares .observation.json files from the observation reporter and treats step-boundary changes as benign, while network-call changes, assertion changes, response differences, missing tests, and different outcomes are meaningful.

How This Fits Spec-Driven Development

Spec-driven development tools describe intended product behavior. Blackbox feature files describe system behavior as exercised by tests. They can live side by side.

The useful bridge is authored translation: a human, test author, or agent turns a product spec into a system or E2E test. From there, Blackbox can generate feature files from the test source and compare them over time. That adds a verification step to the SDD workflow without claiming that generated Gherkin replaces the original product spec.

What To Commit

Commit feature files when they are part of the durable review surface:

features/*.feature

or:

e2e/features/*.feature

If you are also using effects, commit the reviewed effect catalogs:

e2e/tests/__effects__/*.effects.yaml

Generated run outputs usually stay out of git unless CI publishes them:

.blackbox-coverage/
playwright-report/
test-results/
  1. Feature Files, BDD, and Staleness
  2. Generating Feature Files
  3. Files and Artifacts
  4. Spec-Driven Development