Guide

How to get Playwright tests for an app with no documentation

~6 min read · test automation

The most common reason an app has no automated tests isn't laziness — it's that nobody has the specs. The people who knew the requirements left, the docs went stale three releases ago, and the behaviour now lives only in the running app. Here's how to get a real Playwright suite anyway.

The app is the spec

When there's no documentation, the running application is the source of truth. That's not a workaround — for end-to-end tests it's often the honest starting point, because E2E tests should encode what users actually experience, not what a two-year-old requirements doc claimed they would.

So the process starts by exploring the live app: crawl the reachable pages, capture the real DOM, and record the actual network calls the UI makes. From that you can reconstruct the flows that matter — sign-in, search, browse, cart, checkout — as concrete, testable use cases before writing a single test().

Write the use cases down first — in plain language

Skipping straight from "crawl" to "generated test code" is how you end up with a pile of brittle scripts nobody trusts. Put a human-readable layer in between: for each flow, a short use case and a plan listing the test cases and their acceptance criteria. This does two things:

It lets a human (or a reviewer) sanity-check coverage before any code is generated — "you tested login, but not the rejection cases."
It gives every generated test a traceable reason to exist, so six months later you know why a test is there.

The AI trap: hallucinated selectors and tests that pass by accident

Generating Playwright code with an LLM is easy. Generating Playwright code that's real is not. Two failure modes dominate:

Hallucinated selectors. The model invents data-testids or CSS paths that don't exist in your DOM. The test looks plausible and fails instantly — or worse, silently matches nothing.
Tests that pass by accident. An assertion so loose it can never fail (expect(page).toBeTruthy() energy) reports green while testing nothing.

The fix for both is grounding: every selector must be verified against the DOM that was actually captured, and every test must be run headless and pass for the right reason before it's accepted. A test that has never executed against the real app is a guess, not a test.

Make maintenance part of generation, not an afterthought

A suite that's correct on Tuesday and abandoned by Friday isn't coverage. When the UI shifts and a test breaks, the right response depends on why it broke:

If the app changed intentionally and the test is now watching the wrong thing → fix the test.
If the app actually broke → that's a real bug; a human should see it, not have it auto-patched away.

The dangerous shortcut is "heal to green" — quietly relaxing assertions until red turns green. That destroys the only thing a test is for. Repair should correct how a test observes reality, never lower the bar the app must clear. We wrote up a real example of exactly this — two tests that failed because a locator matched the wrong element, fixed without weakening a single assertion — in this case study.

A checklist you can hold any approach to

Did it explore the real running app, or guess from a URL?
Are the flows written down in plain language before code?
Is every selector grounded in the actual DOM?
Did every test run and pass for the right reason before acceptance?
When a test breaks, does it distinguish a test problem from a real bug?

If the answer to any of those is "no," you have generated code, not a test suite.

Have an app with no tests and no docs? That's exactly the case vibe·qa is built for. Pick one deliverable — a Playwright suite, an API suite, manual cases — and we'll build it for free on your running app.

Get one deliverable free →