Opinion

Why test suites rot — and what "self-healing" should actually mean

~6 min read · test maintenance

Test suites don't fail loudly. They rot quietly. Nobody schedules "let the coverage decay" — it just happens, one skipped test and one loosened assertion at a time, until a green build means nothing and everyone has quietly stopped trusting it.

How rot actually sets in

It's rarely one big event. It's a hundred small, individually-reasonable decisions:

A UI change breaks 12 tests before a release. There's no time. Someone adds test.skip to "deal with later."
A flaky test fails 1 run in 10. Instead of fixing the race, someone adds a waitForTimeout and a softer assertion.
A test that used to check a total now just checks the page loaded, because the total selector kept changing.

Each step trades a little truth for a little speed. Individually defensible. Collectively fatal. Six months later the suite is 400 tests, 380 green, and not one of them would catch the bug that takes down checkout.

"Self-healing" is where it gets dangerous

Automated healing sounds like the cure. Often it's the accelerant. Most "self-healing testing" optimises for one metric: turn red into green with no human involved. Read that goal carefully — it's the exact objective a suite pursues while it rots.

Healing to green: a test fails, so the tool relaxes the assertion, swaps in a looser locator, or retries until it passes. The build goes green. Whether the app is correct is now unknown — and unknowable, because the test that was supposed to answer that question just got neutered.

If a real bug caused the failure, healing-to-green doesn't just miss it — it actively hides it and reports success. That's worse than no test at all, because now you're paying for false confidence.

What healing should mean: heal to true

There's a version of self-healing worth having. The difference is entirely in the objective. Instead of "make it green," the goal is "make the test observe reality correctly, and if reality is broken, tell a human."

Healing to true: when a test fails, first classify why. Is the app wrong, or is the test watching the wrong thing? Fix only the test flaw — the stale locator, the bad wait — and never touch the assertion. If the app is actually broken, escalate it as a real bug. A human decides, not a retry loop.

A concrete example: on a real run, two add-to-cart tests failed. The naive move is to loosen them until green. Instead the failure was diagnosed — the add-to-cart worked; the locator was resolving to the wrong element after a slide-in cart widget appeared. The fix was one line in a shared helper to target the correct element. No assertion was relaxed. No spec was changed. The tests still demand exactly what they demanded before — they just watch the right thing now. Full write-up here.

The test you can apply to any tool

Next time a vendor says "self-healing," ask one question: when a test fails because the app is genuinely broken, what does your system do?

If the answer is "heals it and moves on" — run.
If the answer is "distinguishes a test problem from a real bug, fixes only the former, and escalates the latter to a human" — that's a tool that keeps your suite true, which is the only reason to have a suite at all.

Tired of a suite you don't trust? vibe·qa builds coverage that's checked at every step and kept true — not just green. Pick one deliverable and we'll build it free on your running app.

Get one deliverable free →