Feature Files From Tests
Generate Gherkin .feature files from system and E2E tests, then check feature-file drift without making BDD mandatory.
Feature files are a readable behavior layer on top of tests that already exist.
Use this page after you have at least one system or E2E test. You can skip this layer and still use Blackbox for runtime evidence, effect catalogs, matchers, effect coverage, and OMC/DC. Add feature files when the team wants a Gherkin-shaped artifact for review, product discussion, SDD handoff, or CI checks for feature-file drift.
Future package-backed REPL flow: load a test, analyze the AAA/Given-When-Then shape, emit Gherkin, then run the gates.
Source Test
The input can be a plain Playwright-style system test or a Blackbox BDD-DSL test. Plain tests are decompiled best-effort; DSL-authored tests preserve more intent.
test.system('subscribe-flow', 'alice subscribes to the pro tier', () => { test('alice is an existing user with no active subscription', async ({ request, system }) => { const response = await request.post(`${system.bff.hostBaseUrl}/subscriptions`, { data: { userId: 'alice', paymentMethodId: 'pm_card_visa' }, });
expect(response.status()).toBe(201); });});Analyzed Behavior Trace
The analyzer turns test structure into a behavior trace. The linter checks that the trace has a valid AAA shape: `Given*`, `When+`, `Then+`.
adapter: playwrightfeature: subscribing to the pro tierflow: subscribe-flow
scenario: alice is an existing user with no active subscription when: alice POSTs /subscriptions with a valid card then: response status is 201
grammar: aaa-shape: pass missing-then: pass opaque-step: noneGenerated Gherkin
The feature file is a readable projection from the test source. It is useful for review, but it is still checked against the source instead of trusted as disconnected prose.
@flow:subscribe-flowFeature: subscribing to the pro tier
Scenario: alice is an existing user with no active subscription When alice POSTs /subscriptions with a valid card Then the response status is 201Verification Gate
The gate is two-part: Cucumber-compatible Gherkin syntax validation, plus feature-file drift detection against the test source. Runtime effects and observation comparison can add stronger gates later.
$ pnpm exec blackbox features check --features ./features --tests ./e2e/testssyntax: 1/1 .feature files parsed cleanly.drift: no drift detected.
$ pnpm exec blackbox features lint ./e2e/tests --fail-on errorno lint findingsThe BDD Context
BDD and Cucumber made an important idea popular: behavior should be readable by more than the person who wrote the test. Gherkin gave teams the Given, When, Then shape and .feature files gave behavior a reviewable home.
The hard part was maintenance. In many teams, hand-written feature files, step definitions, and test code became three things that had to agree. When the system changed, the prose could stay green-looking while the real behavior moved somewhere else.
Blackbox uses the useful part of that movement without requiring the old workflow:
| Workflow | Starts from | What you maintain | Best fit |
|---|---|---|---|
| Cucumber-style BDD | Hand-written .feature files | Feature text plus step bindings | Teams that want Gherkin to drive implementation |
| Plain system tests | Test code | Test code only | Engineering-only suites |
| Blackbox feature files | Existing system or E2E tests | Test source plus generated/reviewed feature files | Teams that want readable specs without making Gherkin the driver |
Gherkin is the format. Cucumber is not required.
What Blackbox Actually Does
Blackbox does not just stringify test names into .feature files. The feature pipeline has four jobs:
- Detect the test style: explicit Blackbox BDD DSL when present, otherwise Playwright-style tests.
- Analyze the behavior shape into phases: arrange as
Given, act asWhen, conclude with observableThenassertions. - Emit a canonical Gherkin subset:
@flow,Feature, optionalBackground,Scenario,Scenario Outline,Rule,Examples,When,Then, andAnd. - Gate the result with syntax validation, feature-file drift detection, and optional runtime-observation comparison.
The explicit BDD DSL gives Blackbox the strongest signal because the author already wrote scenario, given, when, then, background, or given.each. Plain Playwright tests can still be decompiled, but intent recovery is best-effort. If the test structure is unclear, lint findings are useful feedback rather than something to hide.
Prerequisite
Start with a test source that describes a real flow:
tests/ checkout.system.test.tsor, in the showcase layout:
e2e/tests/ 00-baseline-subscribe.system.test.tsThe test should be worth explaining to another human. Feature files are not a replacement for writing the test or proving runtime behavior.
Choose The Layer
There are three valid paths:
| Path | What you get | What you do not get |
|---|---|---|
| Effects only | Runtime evidence, effect catalogs, matchers, effect coverage, and OMC/DC reports | Human-readable Gherkin output |
| Feature files only | Readable .feature files and feature-file drift checks from test source | Runtime effect coverage |
| Both | Runtime-backed behavior proof plus readable feature files | More artifacts to review |
Use the smallest path that creates value. If the audience is mostly developers and the effect catalog is already clear, effects-only may be enough. If product, QA, platform, or AI-assisted workflows need a readable behavior surface, add feature files.
Generate A Feature File
Use features emit to create .feature files from Playwright or Blackbox BDD-DSL test sources:
pnpm exec blackbox features emit ./tests --out ./featuresFor the showcase layout:
pnpm exec blackbox features emit ./e2e/tests --out ./e2e/featuresThe command accepts one or more test files or directories. It can also use a step library, or skip step resolution:
pnpm exec blackbox features emit ./tests --steps ./scripts/step-lib.jsonpnpm exec blackbox features emit ./tests --no-stepsThe output is a Gherkin .feature file:
@flow:subscribe-flowFeature: subscribing as a pro tier user
Scenario: alice is an existing user with no active subscription When alice POSTs /subscriptions with a valid card Then the response status is 201 And the subscription is persisted in postgres And the tier is cached in redis And a subscribe order is queued in SQSRead this file as a review artifact derived from tests, not as proof by itself. Runtime proof comes from the system run, the effect catalog, and coverage reports.
Check The Behavior Grammar
Use features lint when feature files matter enough to gate. The key rule is aaa-shape:
Given* When+ Then+That means a scenario may have setup, must exercise at least one action, and must conclude with at least one observable assertion. The linter also catches missing Then steps, empty scenarios, unresolved placeholders, overly opaque decompiled steps, and repeated setup that should become a Background.
pnpm exec blackbox features lint ./tests --fail-on errorpnpm exec blackbox features lint ./tests --jsonUse this before trusting generated feature files in review. A pretty feature file is not enough; the underlying scenario needs a shape that can survive as a verification artifact.
Check Gherkin Syntax And Feature-File Drift
Use features check after generating feature files. It runs two checks by default:
- Gherkin syntax validation for
.featurefiles with the Cucumber parser. - Source-vs-feature consistency between the test sources and generated feature files.
pnpm exec blackbox features check --features ./features --tests ./testsFor the showcase layout:
pnpm exec blackbox features check --features ./e2e/features --tests ./e2e/testsUse JSON when a script, CI job, or AI agent needs structured output:
pnpm exec blackbox features check --features ./e2e/features --tests ./e2e/tests --jsonYou can also run only one side of the check:
pnpm exec blackbox features check --features ./features --skip-driftpnpm exec blackbox features check --features ./features --tests ./tests --skip-syntaxFor a dedicated feature-file drift check:
pnpm exec blackbox features drift --tests ./tests --features ./featuresThe feature-file drift kinds are missing, stale, orphan, and unparseable.
Optional Semantic Observation Gate
Feature-file drift is source-to-feature drift. It answers: did the .feature file stay synchronized with the test source?
Semantic observation comparison is runtime-to-runtime drift. It answers: did two runs produce the same meaningful observable behavior?
Use it when reshaping tests, migrating toward the BDD DSL, or comparing a baseline run to a candidate run:
pnpm exec blackbox features compare-observations --baseline ./baseline-observations --candidate ./candidate-observations --jsonThis gate is experimental. It compares .observation.json files from the observation reporter and treats step-boundary changes as benign, while network-call changes, assertion changes, response differences, missing tests, and different outcomes are meaningful.
How This Fits Spec-Driven Development
Spec-driven development tools describe intended product behavior. Blackbox feature files describe system behavior as exercised by tests. They can live side by side.
The useful bridge is authored translation: a human, test author, or agent turns a product spec into a system or E2E test. From there, Blackbox can generate feature files from the test source and compare them over time. That adds a verification step to the SDD workflow without claiming that generated Gherkin replaces the original product spec.
What To Commit
Commit feature files when they are part of the durable review surface:
features/*.featureor:
e2e/features/*.featureIf you are also using effects, commit the reviewed effect catalogs:
e2e/tests/__effects__/*.effects.yamlGenerated run outputs usually stay out of git unless CI publishes them:
.blackbox-coverage/playwright-report/test-results/What To Read Next
- Feature Files, BDD, and Staleness
- Generating Feature Files
- Files and Artifacts
- Spec-Driven Development