05 Day 2 · Session 10

Test Your Build with Playwright MCP

45 min · Browser automation → Smoke test → E2E tests → Visual QA

Objectives

By the end of this lab, you will have:

Why This Matters

You just built AI Trust Check in Lab 4 using the Compound Engineering workflow. Now you'll use another AI capability — browser automation via Playwright MCP — to test what you built. This closes the loop: AI builds, AI tests, human validates.

This is also a Compound AI System in action (see docs/03-frameworks.md) — an LLM + browser automation tool + test runner working as coordinated components. The same pattern applies to any multi-tool workflow.

Step 1: Install Playwright MCP 5 min

Add the MCP server

claude mcp add playwright

Verify installation

/mcp

You should see "playwright" listed.

First run setup

Playwright will automatically install browsers on first use.

Step 2: Smoke Test Your App 10 min

Make sure your AI Trust Check dev server is running:

npm run dev

Let Claude explore the app

You: "Using Playwright, open http://localhost:3000 and explore the AI Trust Check app. Navigate through: 1. The homepage — describe what you see 2. Search for 'ChatGPT' — does the search work? 3. Click on a tool card — does the detail page load? 4. Toggle between pricing tiers — does the safety card update? Take a screenshot at each step."

Review the screenshots

This is AI doing QA on AI-generated code — a compound AI system in action.

Step 3: Test the Core User Flow 15 min

Generate E2E tests for AI Trust Check

You: "Create Playwright E2E tests for the AI Trust Check app running at http://localhost:3000. Test these critical flows: Test 1 — Homepage Search: - Navigate to homepage - Verify the search bar is visible - Type 'Claude' into search - Verify results appear - Verify at least one result contains 'Claude' Test 2 — Tool Detail Page: - Navigate to a tool detail page (e.g., ChatGPT) - Verify the tool name, vendor, and type are displayed - Verify the safety rating card is visible - Verify the data clearance grid shows 3 rows (Public, General, Confidential) Test 3 — Tier Toggle: - On a tool detail page, find the tier toggle buttons - Click a different tier - Verify the safety rating changes (or stays the same, but the UI responds) - Verify the data clearance grid updates Save the tests to tests/e2e/"

Run the tests

npx playwright test

Fix any failures

If tests fail, let Claude fix them:

You: "These tests failed: [paste the error output]. Look at the actual app at localhost:3000 to understand the real selectors and structure, then fix the tests."

Step 4: Visual Regression 10 min

Screenshot-based testing

You: "Using Playwright, take screenshots of the AI Trust Check app in these states: 1. Homepage — default view 2. Homepage — with search results for 'Cursor' 3. Tool detail page — ChatGPT, best tier selected 4. Tool detail page — ChatGPT, free tier selected 5. Mobile viewport (375px) — homepage Save all screenshots to tests/screenshots/ with descriptive names."

Review the screenshots

Look at each one:

Fix anything you spot

You: "Looking at the screenshots, I noticed [issue]. Fix this in the component code."

Step 5: Generate a Test Report 5 min

You: "Run all the Playwright tests and generate a summary: - How many tests passed/failed? - What areas of the app have good coverage? - What's missing? - Suggest 3 more tests we should add."

Checkpoint

Commit your tests

git add tests/ && git commit -m "test: add Playwright E2E tests for AI Trust Check"

The Compound Loop

Notice what just happened:

Step Who What
Product brief PM (human) Defined what to build
Architecture Developer + AI (Lab 2) Defined how to build it
Build (unstructured) AI (Lab 2) First attempt — unreliable
Build (compound) AI (Lab 4) Second attempt — structured, reliable
Test AI (Lab 5) Automated QA on the build
Review Human Validates everything

Each step builds on the previous. The tests you just created can run on every future change — making subsequent work easier. That's compound engineering.

Reflection Questions

  1. How useful was Playwright for catching issues in the AI-generated code?
  2. Would you trust AI-generated E2E tests for your real projects? What would you change?
  3. How does this change your testing workflow?