AI in QA - Issue #7

AI Amplifies What's Already There

Throwing AI at broken processes produces output. But output isn't the same as the right output.

Wayne Roseberry points out that self-healing tests are a patch on brittle design, not a solution. Clipboard Health discovered their agents couldn't iterate because their tests were lying to them. Janna Loeffler makes the case that AI-generated test cases shift work from thinking to reviewing, and that's not always the win it looks like. The QA Queen's post puts it plainly: AI without a process is just autocomplete.

The ceiling on what AI can do for your Quality team is set by the quality of your tests, your architecture, and your process. Not the model you're using.

The AI is a multiplier. You choose what it multiplies.

Catch us live on Youtube

We go live, shortly after this newsletter lands in your inbox.

Headlines & Launches

AI & Testing Events Roundup: April-May 2026
via Emily O'Connor (LinkedIn)
Emily O'Connor rounds up AI and testing events through end of May - Spotify's agentic-first development talk, panels on AI vs software quality, Leeds Testing Atelier, Google IO, Tricentis agentic testing webinar, and more. Get out and learn from each other.

AI, testing, and the DORA AI Capabilities Model
via Lisa Crispin
Lisa Crispin discusses AI, testing, and the DORA AI Capabilities Model with the Beyond Quality podcast hosts. Key takeaways: AI agents degrade over time and need continuous testing, dangerous security pitfalls are underappreciated, and pairing and ensembling are more needed than ever in the AI era.

Self-Healing Tests Are a Symptom, Not a Solution
via Wayne Roseberry (LinkedIn)
Wayne Roseberry argues that self-healing test mechanisms treat symptoms of brittle test design rather than root causes. The real fix is improving test architecture and stability, not patching over flakiness with AI-powered locators.

20% AI time
via Rosie Sherry (Ministry of Testing)
Rosie Sherry proposes allocating 20% of employee time to exploring AI tools, inspired by Google's famous 20% project time. Roughly an hour a day for structured experimentation aligned with business goals, rather than risky all-in commitments.

Tools & Frameworks

Passmark: The open-source Playwright library for AI regression testing
via Passmark (Bug0/Hashnode team)
Open-source Playwright library for AI regression testing. Write tests in plain English; AI executes on first run and caches every action to Redis. Subsequent runs replay at native Playwright speed with zero LLM calls. Auto-heals on UI changes, multi-model consensus assertions, built-in email/OTP testing.

cypress-verify-llm: Confidence-based LLM response verification for Cypress
via Daniil Shapovalov (LinkedIn)
Cypress plugin that replaces binary pass/fail assertions with percentage-based confidence scoring for LLM responses. Supports policy, truthfulness, and semantic summary checks. Zero dependencies, TypeScript, works with Cypress 12+.

QA Agent Hub for GitHub Copilot
via Bruno Peres Christino (LinkedIn)
Collection of specialized prompt-driven agents for GitHub Copilot in VS Code covering bug reporting, test planning, coverage analysis, root-cause triage, and automation gap analysis. Open-source on GitHub.

Agent Device: AI-Native Mobile Automation for iOS & Android
via Michał Pierzchala (Callstack)
A lightweight CLI + daemon that lets AI coding agents automate iOS and Android devices using accessibility tree snapshots and ref-based interactions. Inspired by Vercel's agent-browser, it bridges exploratory agent actions with deterministic replay scripts for mobile testing.

Falling behind on test automation and AI adoption? DevClarity's QA Practice gets your team up to speed fast - with hands-on training, proven workflows, and measurable results within 30 days.

Techniques & Tutorials

AI in QA: Keep TestRail Up-to-Date
via Dzmitry Stekanov (Medium)
Practical walkthrough of using ChatGPT custom GPTs with Actions to keep TestRail test cases in sync with automated specs. Reads test code, generates Given/When/Then scenarios, and updates TestRail via API. Over 90% acceptable results after ~20 prompt iterations.

This 1-Hour Dashboard Saves You $1000s in QA Labor
via Ben Fellows (YouTube)
Build a test strategy dashboard in about an hour that rolls up unit, integration, API, and E2E tests into one view. Spot orphan tests, coverage gaps, and places where you're overpaying for high-level tests that add little value.

Agents Can't Iterate Against Tests That Lie
via Rocky Warren, Clipboard Health
Clipboard Health used multi-agent consensus to triage 174 E2E tests down to 87, driving flake rates from 100% to under 15% in six weeks. They also built a custom Playwright reporter optimized for agent readability and a flaky-test-debugger skill that gives coding agents the context they need to root cause failures.

Research & Data

Stop Asking the LLM to Read the DOM. Ask It to Write a Script That Filters It.
via Anton Angelov (The Testing Frontier)
Anton Angelov reviews the Prune4Web paper: have the LLM write a Python filter script for the DOM instead of feeding it raw HTML. A 0.5B parameter model on a pruned 20-element shortlist hits 88% grounding accuracy - beating a 6x larger model on raw pages (47%). The bottleneck is input quality, not model size.

AI and Testing: Improving Retrieval Quality, Part 2
via Jeff Nyman (Tester Stories)
Jeff Nyman runs four controlled experiments against a RAG baseline (Contextual Precision 0.33, Faithfulness 0.33): smaller chunks, more retrieved chunks, combined approach, and semantic chunking. None improve retrieval quality. The key finding: pure semantic similarity search cannot distinguish between topically relevant content and specific answers. Parameter tuning is not the fix - architectural changes like hybrid search or re-ranking are needed.

Foundations

APIs Are Predictable. AI Isn't. Here's What I Built.
via QA Queen (Elementor Engineers / Medium)
A QA engineer built a Cursor skill (api-test-generator) that gives AI agents a structured workflow for generating API tests - workspace resolution, route inference, framework detection, and a secrets gate before code generation. The key insight: AI without a process is just autocomplete.

Stop Letting AI Write Your Test Cases (You're Slowing Yourself Down)
via Janna Loeffler
AI-generated test cases don't eliminate work - they shift it from thinking to reviewing, from intent to volume. The real value comes from letting AI handle scale while humans handle judgment, risk assessment, and deciding what not to test.

Quick Links

9+ Software Testing Agents to Know (Daniel Knott, YouTube)

Why Your AI Agents Are Framework-Bleeding (and How to Fix It) (Shreya Agrawal, LinkedIn)

WebdriverIO MCP 3.3.0 Adds HTTP Transport for Local LLMs (Vince Graics, LinkedIn)

Same Prompt, Same AI: Why Your Scaffold Matters More Than Your Prompt (Ivan Davidov, LinkedIn)

If something in this issue made you think differently about how your team approaches AI in testing, pass it along. The best conversations about AI and QA are happening in Slack channels and stand-ups, not just newsletters.

Have something worth featuring? Reply and send it my way, I read every link.

Thanks for reading,
Butch Mayhew