AI in QA - Issue #8

Learning Opportunity

AI in QA isn't a trend you can afford to watch from the sidelines. Breakpoint, a free, global conference put on by BrowserStack brings together the people who are actually building and testing with AI. Register free and spend three days getting ahead of the curve.

Get Your Free Ticket Today

AI Content Slop is Real

This week I've noticed even more posts that go something like this. "Most testers are still..." and as I scroll I see this pattern. Instead of ____ do _____, with, "How this changes the game" and "That means ....". All of these posts I can tell are generated with little to no thought put into them. Oh you can't forget about an infographic featuring at least 22 icons summarizing the post. These typically don't include any information about how to accomplish this changed game practically, just a bunch of hashtags.

It honestly made me sad, while there is a lot of innovative stuff happening in the Quality and Testing space around leveraging AI in our work, there is also a lot of garbage. I hope to continue to be a slop filter through this newsletter and curated content. As I've continued to work through what will make the cut of each issue versus what gets filtered out, there are a few things that I typically look for.

Is the content thoughtful?

If there is reasoning behind the thoughts shared, regardless of whether I completely agree with the concepts I'm willing to include the article if anything in order to challenge my beliefs and continue to help shape my thoughts and reasoning.

Is the content actionable?

One thing I typically look for is a GitHub repo with an implementation of the concept or a detailed enough story so that I could replicate the actions discussed that lead to some outcome.

Is the content useful?

Is there something of value to either a beginner or someone more experienced. A lot of the time the LinkedIn posts that are full of 1 liners are typically not useful. A better approach would be to take one of those points and dive in and break things down in a thoughtful, actionable way. One of the articles I read this week was from James Bach - Is There Something in Your "I"?. A quote reads:

Even if you are an ardent AI fanboy, wouldn’t you prefer to prompt AI for yourself instead of reading someone else’s article generated from a prompt?

Well said James! One other special shoutout to Patrick Prill (Test Pappy) on a solid 42 day blog post streak! While I haven't read every article, the articles I did read helped shape a lot of my thoughts around AI in Testing and QA. Go check them out.

Headlines & Launches

AI Test Generation vs. Test Automation Systems
Ivan Barajas Vargas
Ivan Barajas Vargas draws a sharp line between AI test generation and a real test automation system. Vibe testing gets you 60 tests fast, but scaling to 500+ without architecture, flakiness management, or coverage strategy leads to more manual QA than you started with.

The State of Open-source AI-powered Test Automation
via Alumnium (Alex Rodionov)
Alex Rodionov benchmarks three open-source AI test automation approaches - test generators (LaVague), test runners (Shortest), and test libraries (Alumnium) - using a simple calculator test, comparing code quality, execution speed, and cost per run.

Tools & Frameworks

Alumnium - AI-Native End-to-End Testing Library
GitHub Repo
AI-native library and MCP server for end-to-end testing that works with Playwright, Selenium, and Appium. Uses natural language for actions and assertions, letting you write tests like al.do('click the login button') and al.check('page shows welcome message').

Scaling Agentic Accessibility Audits with Browser-Side Scans
Cameron Cundiff (LinkedIn)
Cameron Cundiff shares @accesslint/mcp - an MCP server that runs deterministic browser-side accessibility scans via JavaScript, then passes violations to an AI agent to locate and fix source code. Scales with violation count, not codebase size.

Keyboard Intelligent Guided Test
Wilco Fiers (Deque Systems)
Deque releases an AI-automated keyboard accessibility test as part of their Intelligent Guided Tests. Checks focusability, contrast on focused elements, keyboard traps, and more. One button to start, AI pre-fills human judgement calls for confirmation.

Quern - Debug server for AI-assisted iOS & Android development
Jerimiah Ham (Quern.dev)
Open-source local debug server for macOS that gives AI coding agents live access to iOS simulators, Android emulators, and physical devices - logs, network traffic, screenshots, and UI control via 76 MCP tools. Works with Claude Code, Cursor, and any MCP client.

Gherkin Guidelines for AI
Andrew Knight aka Automation Panda (GitHub Repo)
Open-sourced opinionated Gherkin guidelines designed to work as AI context. Covers structure, collaboration, and automation-ready specs. Use as a checklist by hand or attach to an AI agent for writing BDD scenarios that hold up over time.

Falling behind on test automation and AI adoption? DevClarity's QA Practice gets your team up to speed fast - with hands-on training, proven workflows, and measurable results within 30 days.

Techniques & Tutorials

Why I Stopped Writing Docs in Confluence (And Let AI Agents Keep Them Honest)
Amit Rawat (LinkedIn)
Amit Rawat describes replacing Confluence with a Markdown + MkDocs + GitLab Pages pipeline, using Claude Code hooks to auto-update documentation when product changes happen. Documentation updates become a side effect of development work instead of a separate manual task.

The Harness Era of QA Has Started
Yusuf Tayman (LinkedIn)
Introduces "harness engineering" - a system of guides, sensors, and memory wrapped around AI models for QA. Covers a Manual Tester Agent and Playwright Agent with modes for exploration, comparison, self-healing, and refactoring. Built on browser-harness from the Browser Use team.

MCP Servers for Test Automation: 4 Practical Use Cases
Michal Slezak (testingplus.me)
Walks through four MCP servers for test automation: ReportPortal (pulling test results into Cursor for debugging), Jira (auto-creating bug tickets from failed tests), GitHub (creating repos and PRs from LLM output), and Playwright MCP. Also covers security risks and token efficiency.

Test-Driven Claude Code Skill Development Updated (video)
Antony Marcano (Substack)
Antony Marcano is evolving his Test-Driven Agentic Behaviours framework into an open source project called stagentic-ai. Each test run acts as a rehearsal where subagents execute test steps, and you iterate on SKILL.md directions until the agent performs as desired.

Ensemble Testing of an LLM Application (video)
Anupam Krishnamurthy (YoutTube)
Follow-up to a previous session on LLM-as-a-judge testing, this meetup explores ensemble testing approaches for LLM applications — addressing how to trust an LLM judge by using multiple evaluators together.

Research & Data

Thinking Fast, Slow, and Artificial: How AI is Reshaping Human Reasoning
Maryia Tuleika (LinkedIn)
Research paper distinguishes "cognitive offloading" (using AI output as input to your own thinking) from "cognitive surrender" (accepting AI output without evaluation). Finds surrender is more likely in people with higher AI trust and lower deliberate thinking. Directly relevant to reviewing AI-generated test suites.

DORA and AI Capabilities
Janet Gregory, Lisa Crispin (Agile Testing Fellowship)
Janet Gregory and Lisa Crispin break down DORA's 2025 AI Capabilities Model for testing and quality professionals. Key takeaway: AI amplifies whatever already exists - healthy practices get stronger, dysfunction gets worse.

Foundations

You're Testing AI Wrong
Simon Prior (Lead Test Include)
Argues most teams test software that contains AI, not the AI itself. Proposes a five-dimension framework: reliability, performance, governance, integration, and explainability. Also flags the meta-problem of using AI to test AI - introducing a second source of hallucination risk.

Creating a Playwright Framework with AI
Callum (Ryan) Akehurst-Ryan
Detailed playbook for building a Playwright E2E test framework from scratch using Claude Code and the Playwright MCP. Covers scoping, writing tests one at a time, reviewing assertions, hardening against flakes, and maintaining consistency. Honest assessment of what AI does well and where it needs human guidance.

Quick Links

AI for QA - Ben Fellows (YouTube) - Practical AI for QA content ~ 5min videos. Give the channel a sub.

Kane CLI - Browser Automation Tool For Testing (TestMu AI) - There is a free plan with 200 Credits to try this tool out.

AnyWayData Adds CLI for Test Data Generation (AnyWayData) - Added CLI support for AI tools to generate data.

Until Next Time

If something in this issue made you think differently about how your team approaches AI in testing, pass it along. The best conversations about AI and QA are happening in Slack channels and stand-ups, not just newsletters.

Have something worth featuring? Reply and send it my way, I read every link.

Thanks for reading,
Butch Mayhew