Testing Software In The Age Of AI: Why Nobody Has Won And The Resurgence Of Code As King

With the rise of AI-assisted development, software engineers are shipping at a pace that would have seemed impossible a couple years ago. And yet, the unreliability and volume of AI-generated code that developers deal with leaves them less confident than ever in its correctness.

To combat this problem, companies (including mine) have been working on a new crop of AI-native software testing products, but adoption is low, and no player has dominated the space yet. How did we get here, and why hasn’t anyone won?

The Problem That Was Almost Solved

The testing landscape has traditionally been divided into three layers: unit tests, integration tests and end-to-end (E2E) tests. The first two are largely solved. Mature frameworks exist; patterns are well-understood, and unit/integration suites are easy to build and maintain.

End-to-end testing is where things get complicated. Tools like Playwright and Cypress have become the default choice for E2E testing. If you’ve worked in a modern web codebase, you’ve almost certainly encountered one or both.

But that doesn’t mean everything has been solved. Tests can be flaky because of timing issues, non-deterministic UI behavior and race conditions, or brittle due to selectors being tied to CSS classes or DOM elements that constantly change. As a result, many teams give up on comprehensive E2E coverage and fall back to manual, human-led QA, which scales poorly and catches things late.

The Promise Of No-Code Agentic Testing

Recently, a category of tools has emerged with a compelling pitch: What if you could describe a test in plain English, point an AI agent at your application, and have it figure out the rest? No selectors. No scripts. Just intent. A non-technical team member types “verify that a new user can complete signup and land on the dashboard,” and an agent navigates the browser, clicks through the flow and reports back.

Agentic testing has become one of the hottest and most competitive niches. Yet, despite this, no tool has achieved anything close to the market dominance of Playwright or Cypress. Why does adoption remain so low?

Why Nobody Has Won

I think the main problem is that end-to-end testing is, at its core, a deeply technical activity disguised as a simple one. Underneath the topmost layer (a user clicking through a browser), a comprehensive E2E setup involves mocking production endpoints, intercepting network requests, seeding databases, making precise assertions at every layer of the stack and integrating tightly with CI pipelines.

Making an agent generic enough to handle the full surface area and meet the expectations of every team out there is, practically speaking, an impossible task. Everyone’s setup is slightly different.

Then there’s the logistical overhead: Agents are slow and expensive. A script that runs in four seconds is replaced by an agent that takes 45. At scale, this compounds fast. And agents introduce a new form of flakiness stemming from the fact that LLMs are non-deterministic. Tests that passed yesterday may fail today, not because the application changed, but because the agent decided to behave differently.

There’s also a more human factor that tends to get overlooked: Tests have always been an extension of the codebase. They live in the repository; they get reviewed; and they encode institutional knowledge about how the application is supposed to behave. Unsurprisingly, developers are reluctant to strip out such a core component of their infrastructure and hand it to a third-party SaaS dashboard they don’t control.

Here’s where the story takes an unexpected turn.

The Resurgence Of Code As King

The no-code testing movement was based upon the thesis that writing and maintaining test scripts was a bottleneck, requiring engineering time and skill that many teams couldn’t spare. That assumption made sense when it was formed, but is less true today.

LLMs are exceptional at writing code. Generating a test, refactoring a fragile selector or maintaining a test suite are the kinds of tasks where AI coding shines. Engineering teams have already adopted AI-assisted workflows for writing code, so naturally, this should extend to test authorship as well. If anything, tests are easier for LLMs to write well, because they’re structured, constrained and have clear success criteria.

The category that was trying to abstract away code is now competing against a world in which writing code is no longer the bottleneck.

What The Answer Looks Like

I believe that the solution to E2E testing isn’t to replace scripts with agents, but rather, to augment scripts with AI, precisely in the places where deterministic logic falls short:

• Flakiness: Instead of hardcoded waits for a page load, have an agent observe the page and determine when it’s ready.

• Brittleness: Instead of a CSS selector, insert a natural-language instruction to find the right element regardless of underlying implementation.

• Subjectivity: Instead of a boolean assertion, have an agent evaluate a screenshot and decide whether the visual output matches intent.

In each case, the test is still code. It still lives in the repository. The developer still authors and owns it. AI is doing the thing AI is actually good at: filling in the gaps where scripted logic is too rigid or too brittle, rather than replacing the entire substrate.

Of course, augmenting scripts with AI doesn’t come without its own challenges. The same non-determinism that fully agentic testing suffers from doesn’t disappear when you narrow the agent’s scope. An AI-powered selector or visual assertion can still behave differently between runs, and debugging a failure now requires reasoning about both your application logic and the model’s judgment.

To mitigate this, developers have to be thoughtful about where the line sits; over-applying AI may produce a result that is equally or more fragile for no real gain. A few practices help: pin model versions so you can detect drift; invest in good observability so failures are easy to triage; and add deterministic fallbacks to AI-powered steps wherever possible.

Of course, it takes time and effort to reason through this and conclude exactly which tests would benefit from AI, and ironically, it’s exactly this type of work developers want to eliminate as they refine their QA processes. Ultimately, AI augmentation has to earn its place in the suite, and where that line falls will look different for every team.

The teams that crack this could look less like they invented something new, and more like they finished what others started.

https://www.forbes.com/councils/forbesbusinesscouncil/2026/05/28/testing-software-in-the-age-of-ai-why-nobody-has-won-and-the-resurgence-of-code-as-king/a>

Testing Software In The Age Of AI: Why Nobody Has Won And The Resurgence Of Code As King

The Problem That Was Almost Solved

The Promise Of No-Code Agentic Testing

Why Nobody Has Won

The Resurgence Of Code As King

What The Answer Looks Like

Leave a reply Cancel reply

Privacy policy

Information Capture

Information Use

Security

Tracking

Contact

Terms of use

Warranty

Liability

Infringement

Hyperlinks

Trademarks