AI in QA - Issue #1

Welcome to the first of hopefully many newsletters to come!

If you're in QA right now, you already know - the tooling landscape is changing faster than most of us can keep up with. Every week there's a new AI-powered testing tool, a new model that changes what's possible, or a new debate about what testing even looks like going forward. I'm deep in it, learning as I go, and AI in QA is where I'll share what's actually worth paying attention to.

The goal is simple: high-impact articles and tools, no filler.

Headlines & Launches

Automate repository tasks with GitHub Agentic Workflows
via GitHub Blog
GitHub launches Agentic Workflows (technical preview) - AI-authored repository automation written in plain Markdown and executed by coding agents like Claude Code, Copilot, and Codex inside GitHub Actions, with safety guardrails including read-only defaults and human review before merges.

DoesQA Automation Intelligence: Context-Aware Test Reporting
via LinkedIn (Sam Smith)
DoesQA launches Automation Intelligence, a context-aware reporting feature that generates human-readable summaries for failed test runs - designed to understand your application and tests deeply rather than just piping errors through an LLM. Available on all plans at no extra cost.

wdiox: A WebdriverIO Agentic CLI for Browser and Appium Sessions
via LinkedIn (Vince Graics)
wdiox brings the agentic CLI testing pattern to WebdriverIO - stateless commands with element refs that work across both browser and mobile sessions, built on top of the @wdio/mcp snapshot engine, and ships as an installable agent skill.

Falling behind on test automation and AI adoption? DevClarity's QA Practice gets your team up to speed fast - with hands-on training, proven workflows, and measurable results within 30 days.

Tools & Frameworks

Playwright Best Practices Skill by Currents
via Currents
Currents releases an open-source Agent Skill that gives AI coding agents specialized knowledge for writing, debugging, and maintaining production-grade Playwright tests - covering locators, assertions, Page Object Model, flaky test detection, accessibility, component testing, CI/CD, and 30+ specialized topics, progressively disclosed to preserve context window.

skills-check is now available as a GitHub Action
via LinkedIn (Chris Williams)
skills-check now runs in CI pipelines via GitHub Actions, bringing automated auditing of SKILL.md files used by AI coding agents - covering version drift, security issues like hallucinated packages and prompt injection, token budget analysis, and metadata linting.

Introducing Upright: Open Source Synthetic Monitoring with Playwright
via 37signals Dev
37signals open-sources Upright, the synthetic monitoring system they built to replace Pingdom for Basecamp and HEY - a Rails engine supporting Playwright browser probes with video recording on failure, HTTP, SMTP, and traceroute probes, deployable across multiple global sites for around $110/month.

Arize Phoenix: Open Source AI Evaluation and Observability Platform
via LinkedIn (Qambar)
A hands-on walkthrough of Arize Phoenix's LLM evaluation workflow - defining datasets, configuring models, creating evaluators, and inspecting results. Rated A tier for teams building serious LLM systems that need custom evals plus observability, though the flexibility can slow teams down who want one obvious path.

Techniques & Tutorials

Agentic Testing with Playwright CLI Skill
via LinkedIn (Debbie O'Brien)
Install the Playwright CLI Skill with one command, then prompt any AI agent to navigate a site and automatically generate Playwright tests complete with video recordings and traces - no manual test writing required.

Git Worktrees Done Right
via Ahmed El Gabri
A practical guide to the bare repo git worktree pattern that enables multiple AI coding agents to work in parallel without conflicts - each agent gets an isolated worktree while sharing a single git history, eliminating stash cycles and enabling safe parallel automation.

Lessons from Building Claude Code: Seeing like an Agent
via X (Thariq)
Thariq from the Claude Code team shares hard-won lessons on designing an agent's tool set - the key insight being that tools should be shaped to what the model naturally excels at, not arbitrary use cases, illustrated through the evolution of the AskUserQuestion tool across three design attempts.

GitHub Copilot in QA: What Nobody Tells You Until You're Already Using It
via Medium (Swati Sabharwal)
A QA lead shares six months of hard-won lessons using GitHub Copilot on a real team: why prompts with full context produce dramatically better results, how to know when Copilot has started hallucinating mid-task (stop after 5 terminal commands), and why its first answer is not always the best one.

Reverse Gherkin: Business-Readable Playwright Test Reports Without the BDD Overhead
via Alister Scott
Flips the traditional BDD workflow - instead of writing Gherkin first and mapping it to code, you write native Playwright tests using test.describe and test.step (Given/When/Then), and a custom reporter generates plain-language business-readable markdown reports automatically when the tests run.

Research & Data

AI and Testing: Evaluating the Future
via Tester Stories (Jeff Nyman) - Part 1 of a series
The first in a series on testing AI-infused enterprise applications - arguing that we're not testing LLMs in isolation but complex systems built on RAG pipelines, agentic workflows, and structured output patterns. Introduces the three pillars of AI evaluation (answer relevance, faithfulness, contextual precision) and the LLM-as-a-Judge approach using tools like DeepEval and RAGAS.

AI Testing Evaluator
via The Test Eye (Rikard Edgren)
Rikard Edgren releases a benchmarking playground for AI testing agents: a simple web app with two versions - one containing 23 intentionally introduced bugs and a "fixed" version with 5 new bugs added - to evaluate whether AI agents can find bugs, generate regression suites, verify fixes, and catch newly introduced defects.

Quick Links

Kintsugi: Agentic Development Environment for Claude Code (LinkedIn (Cole Medin))

Claude Code Remote Control: Access Sessions from iOS/Android (LinkedIn (Jeff Morgan))

/simplify: Claude Code Parallel Agent Cleanup Command (X (Boris Cherny))

GitNexus: Browser-Based AST Knowledge Graph with MCP Interface (LinkedIn (Andre Lindenberg))

Memory Engine: Persistent, Searchable Memory for AI Agents (LinkedIn (John Pruitt))

AppraiseJS: Unified Test Authoring and Automation Platform (LinkedIn / Hasnat Jamil)

If this resonates with you, chances are it'll resonate with someone on your team too. Sometimes the best thing you can do for a colleague is share something you learned that just might make their day a little easier.

Headlines & Launches

Tools & Frameworks

Techniques & Tutorials

Research & Data

Quick Links

Enjoyed this article?