AI in QA - Issue #10

Sponsorship

Interested in sponsoring AI in QA? Reach out.

AI in QA Review Livestream Monday 7:00 AM CT

This week between trainings at DevClarity, I spent a portion of my week building out new content and thinking deeply around the first principles around using AI tools. At the same time I saw Vitaly Sharovatov requesting feedback on a research he is working on. The topic: Agentic Workflows: Engineering Principles for Non-Engineers.

As I read through the 11 Chapters he had already published I found myself nodding along as I experienced many of the same issues while working with AI tools. One section which wasn't yet covered is the idea of measuring changes you make to your agent as you maintain it. The principle: One Variable at a Time (OVAT) or controlled variables in the Scientific Method.

The idea is that when you change multiple variables simultaneously, you can't isolate which change caused the observed result. It's foundational to experimental design.

In the context of your agent/skill, harness, model, or prompt these are all different variables that can affect the outcome. You wouldn't think that a claude code (terminal app) update could cause a change in behavior but it's one of the variables as well.

From that feedback Vitaly has covered this principle in Chapter 15: Change one thing at a time, with a very helpful prompt which will build a hook into Claude Code that will let you know if your version of claude code was updated or your model was changed on startup.

Headlines & Launches

Claude Code For QA; The Agentic Workflow That Will Save You 100+ Hours
Alex Krylov (Medium)
How to build an agentic QA workflow using Claude Code with sub-agents, custom skills, and MCP integrations, covering test case generation, manual testing, and automation as a gated state machine.

How I Use AI to Code
Chris Parsons (chrismdp.com)
Comprehensive guide to agentic engineering in 2026: harness beats prompts, train the AI instead of reviewing diffs, spec the problem not the solution, and invest in skill files + feedback loops. Covers toolchain recommendations, context management, and the shift from approver to trainer role.

Three Paths to Agentic UI Automation
Amit Rawat (amitrawat.dev)
Breaks down three architectures for AI-driven browser automation, AI-generated code, live driving (MCP), and composed tools with real cost/latency/reliability benchmarks. Argues composed tools (Path 3) wins for production workloads.

Mutation Testing + AI: Eliminating 2000+ Mutants in One Night
Tim Ottinger (LinkedIn)
Tim Ottinger introduced mutation testing on a long-running project, found 3200+ surviving mutants, then used LLMs (Warp2) to categorize and triage them. Eliminated 2000+ mutants in one night. Emphasizes human-in-the-loop review and using AI for augmentation, not automation.

Tools & Frameworks

Agentic QA Tool
Daniel Hogg (LinkedIn)
Multi-agent QA tool built on LangGraph that plans, writes, executes, and reports tests for APIs, web UIs, and Python codebases. Uses an Orchestrator → Planner → Writer → Executor → Reporter pipeline with self-healing and reuse scoring. GitHub Repo

Claude Code Skills as Executable UML
Antony Marcano (Substack)
Using PlantUML diagrams as executable skill definitions for Claude Code instead of prose, reduces prohibitive clauses from 24 to 11, improves reliability, and costs only ~27% more output tokens.

Bonsai: Cultivate AI You Can Trust
James Kip (LinkedIn)
Classical QA breaks when your team ships AI in production. Bonsai teaches the frontier-team playbook: eval design, LLM-as-judge calibration, RAG retrieval separation, agent trajectory scoring, red-teaming as a regression flywheel, and drift-aware CI.

Agentic Test Explorer: AI-Driven Exploratory Testing Framework
Oscar Barrios (GitHub)
Open-source agnostic AI-driven exploratory test framework built with LangGraph that intelligently explores, tests, and validates any web application using MCP tools.

Falling behind on test automation and AI adoption? DevClarity's QA Practice gets your team up to speed fast - with hands-on training, proven workflows, and measurable results within 30 days.

Foundations

Its an AI Tool - What Does That Really Mean?
Maaret Pyhajarvi (LinkedIn)
Argues the industry needs to redesign tool governance for AI - classifying what actually makes up an AI tool rather than default-denying access based on the AI label. Analyzes Vibium as a case study.

Tips for Writing Playwright Tests with Cursor
Filip Hric (LinkedIn)
10 practical tips for using Cursor with Playwright covering project rules, workflows, screenshots, Playwright MCP for debugging, and why planning outside the AI tool matters.

AI Usage Guidance Infographic
Callum Akehurst-Ryan (LinkedIn)
Infographic distilling guidance on writing AI prompts to keep output quality high. Aimed at beginners using AI for any purpose, not just engineers. Covers prompt writing best practices including setting red lines.

Techniques & Tutorials

Lessons Learned from Building an AI-Enabled Test Automation Repo
Swati Seela (Medium) SOFTWARE TESTING WEEKLY
Six practical lessons from building an AI-powered API test pipeline from Jira tickets to pytest. Key takeaway: garbage requirements produce garbage tests faster, speed doesn't equal completeness.

Designing a custom AI agent for repetitive QA workflows
Rimple Sharma (Medium (Amex GBT Technology))
Part 1 of a series on building a custom ReAct-loop agent for repetitive QA workflows at Amex GBT. Key insight: the agent's effectiveness depends on structured instructions, not the model. Compares plain text vs structured prompts with clear examples.

Using Claude Code for Manual Testing
Samantha Louw (LinkedIn)
Practical guide to using Claude Code for manual testing, not automation. Covers using the Chrome extension to navigate live pages, investigate against PRDs, surface missing acceptance criteria and edge cases. Highlights auth inheritance via browser session and MCP tool integration.

What Makes Good Context For AI-Generated Tests?
Puja Jagani (Medium) SOFTWARE TESTING NOTES
A practical framework for feeding AI the right context when generating tests: understand what you're testing, identify what can break, then organize that knowledge into structured prompts. Includes a concrete payment processing example.

Research & Data

AI and Testing: Improving Retrieval Quality, Part 4 – Stories from a Software Tester
Jeff Nyman (Tester Stories)
Nyman tests the same RAG system on a different paper and discovers document structure matters as much as query type. A good reminder that retrieval quality depends on how information is presented, not just what you ask.

Quick Links

wkdomains: macOS Browser for Developers and Coding Agents
wkdomains (GitHub)
A macOS browser that exposes your live browser state (screenshots, DOM, XHR, cookies) to coding agents like Claude Code and Codex via a local API. Human browses normally, agent gets structured access to the same page.

Codex can now use the in-app browser to test your app at different viewport sizes
James Sun (X)
OpenAI's Codex can now test apps at different viewport sizes using its in-app browser, taking screenshots at key breakpoints. Supports hiding the browser to speed up testing by 1-2x.

Claude Code A full AI testing workflow in the IDE. How did you actually set it up?
/r/QualityAssurance (Reddit)
Reddit thread in r/QualityAssurance on scaling Claude Code from script authoring into a full AI testing workflow, generating tests from PRDs, authoring scripts, auto-fixing CI failures. Good for seeing how others are actually using it.

If something in this issue made you think differently about how your team approaches AI in testing, pass it along. The best conversations about AI and QA are happening in Slack channels and stand-ups, not just newsletters.

Have something worth featuring? Reply and send it my way, I read every link.

Thanks for reading,
Butch Mayhew