Case study · real run

20 tests for a live shop in one pass — and the failure that proved the point

Target: a live production e-commerce shop we operate · single session · Playwright + TypeScript

We pointed vibe·qa at a running online shop with no test suite and no formal test docs, and asked one thing: give us real end-to-end coverage we can trust and re-run. One session later there were 20 passing Playwright tests. The interesting part is the two that failed first.

pages crawled

test cases, 6 use cases

independent AI reviews

20/20

passing in 65s

Starting from nothing

No documentation was provided. vibe·qa crawled the live site to depth 2 (25 pages, zero errors), then drafted 6 use cases and 20 test cases covering the homepage, product listing, category browse, product detail, add-to-cart, and login — the flows real customers actually use. Every use case and plan was written to the vault as plain markdown, so a human can read exactly what's being tested before a line of code exists.

Checked at every step — not just at the end

Between each stage, an independent reviewer agent inspected the work and could block it:

Stage	What was reviewed	Result
Review 1	Use-case & plan coverage	Changes required → 3 blocking + 2 advisory fixes applied
Review 2	Generated test code, before running	Approved → 2 advisories applied
Review 3	Code after the heal	Approved → 4 advisories applied

This is the opposite of "an AI wrote some tests." Each artifact is adversarially checked by a fresh set of eyes before it's allowed downstream.

The failure that matters

First full run: 18 / 20 passed. Two add-to-cart tests hung waiting for the header cart to update. A naive tool would do one of two things here: delete the tests, or loosen the assertion until it goes green. vibe·qa did neither.

What the healer actually diagnosed

The trace showed the add-to-cart worked — the side-cart appeared and the real header link updated to 137 Kč | 1 ks. The test was watching the wrong element: a slide-in cart widget added extra /cart/products links, and the locator's .first() was resolving to one without the summary text. The app was fine. The selector was wrong.

The fix was a one-line locator change in a shared helper — filter to the link that actually carries the summary text — with no spec touched and no assertion weakened:

export function headerCartLink(page: Page): Locator {
  return page
    .locator('a[href*="/cart/products"]')
    .filter({ hasText: /\d+\s*k[čc]\s*\|\s*\d+\s*ks/i })
    .first();
}

The principle

A self-healing suite should make tests true, not just green. Here the repair corrected how the test observed reality; it never lowered the bar the app has to clear. Rerun: 20 / 20 in 65 seconds. If the failure had been a real product bug, it would have been escalated to a human — not patched over.

Guardrails held the whole way

On a live commercial site, a test run must not misbehave. Across every run: zero requests to checkout, order, or payment URLs; zero real personal data submitted; zero real credentials entered. Add-to-cart was exercised; nothing was ever purchased.

Beyond the browser

The same pipeline produces more than UI tests. On other targets it has generated full API suites — for example a pytest + httpx suite against a public API that keys off the response body's real status code, gives every test case a throwaway identity it deletes in teardown, and skips honestly with a reason rather than faking green when setup fails. UI E2E, API suites, manual test cases, load tests, and seeded test data all come out of one run.

Want this on your app? Pick one deliverable and we'll build it for free — no commitment, on your running app, with or without docs.

Get one deliverable free →

Figures are from an actual 2026-04-13 vibe·qa run against a production site operated by aKoření s.r.o. Test-case counts, review stages, timings, and the heal are taken verbatim from the run's saved report and heal log.