Intelligent browser testing powered by GPT-4o - Write test steps in natural language, let AI generate and execute the Playwright code.
- π§ AI-Powered: GPT-4o generates Playwright code from natural language descriptions
- πΎ Smart Caching: Zero-cost reruns with intelligent code caching
- π Auto-Retry: Configurable retry strategies with error context learning
- π¦ StepsPacks: Organize tests into reusable, isolated test suites
- π― Flexible HTML Cleaning: Optimize context sent to AI by removing irrelevant elements
- π Detailed Reporting: JSON and HTML reports with token usage and cost tracking
- π§ Mock Mode: Debug workflows without API costs
- β‘ Multiple Strength Levels: Balance reliability vs. cost with onlycache/medium/high modes
- Installation
- Quick Start
- Configuration
- CLI Options
- How It Works
- StepsPacks
- Examples
- Cost Optimization
- Troubleshooting
- Contributing
- Node.js 16+
- npm or yarn
- OpenAI API key (Azure OpenAI or standard OpenAI)
# Clone repository
git clone <your-repo-url>
cd pw-ai-smartpeg
# Install dependencies
npm install
# Configure API key
cp .env.example .env
# Edit .env and add your OPENAI_API_KEYCreate a .env file:
OPENAI_API_KEY=your_azure_openai_key_hereEdit aidriven-settings.json:
{
"execution": {
"entrypoint_url": "https://your-site.com",
"headless": false,
"steps_file": "aidriven-steps.json"
},
"ai_agent": {
"type": "gpt-4o",
"endpoint": "https://your-endpoint.openai.azure.com/openai/deployments/gpt-4o",
"cost_input_token": "0.000005",
"cost_output_token": "0.00002",
"cost_cached_token": "0.0000025"
}
}Edit aidriven-steps.json:
{
"steps": [
{
"sub_prompt": "Click the login button",
"timeout": "5000"
},
{
"sub_prompt": "Fill username with test@example.com and password with SecurePass123",
"timeout": "3000"
},
{
"sub_prompt": "Click submit and wait for dashboard",
"timeout": "8000"
}
]
}# First run (generates cache)
node index.js --strength medium
# Subsequent runs (uses cache, zero cost)
node index.js --strength onlycache
# High reliability mode (3 attempts per step)
node index.js --strength highaidriven-settings.json:
| Field | Description | Example |
|---|---|---|
execution.entrypoint_url |
Starting URL for test | "https://example.com" |
execution.headless |
Run browser in headless mode | false |
execution.steps_file |
Path to steps JSON | "aidriven-steps.json" |
ai_agent.type |
AI model to use | "gpt-4o" |
ai_agent.endpoint |
Azure OpenAI endpoint | "https://..." |
ai_agent.cost_input_token |
Cost per input token | "0.000005" |
ai_agent.cost_output_token |
Cost per output token | "0.00002" |
ai_agent.cost_cached_token |
Cost per cached token | "0.0000025" |
aidriven-steps.json:
{
"steps": [
{
"id": "73443201", // Auto-generated hash (optional)
"sub_prompt": "Your task description in natural language",
"timeout": "10000" // Milliseconds to wait after step
}
]
}Step Fields:
sub_prompt(required): Natural language description of the actiontimeout(optional): Pause duration after step completion (default: 10000ms)id(auto-generated): MD5 hash used for caching
--strength <level>| Level | Attempts | Cache | Use Case |
|---|---|---|---|
onlycache |
1 | Required | Zero-cost reruns of stable tests |
medium |
2 | Preferred | Default balance of cost/reliability |
high |
3 | Preferred | Complex workflows needing retries |
# Disable caching (always generate fresh code)
--nocache
# Mock mode (no API calls, uses hardcoded actions)
--mock
# Use a StepsPack
--stepspack <name>
# Generate HTML report
--html-report
# Stop execution on first error
--stop-on-error
# Customize HTML cleaning
--htmlclean-remove <items>
--htmlclean-keep <items>Control what elements are removed from HTML before sending to AI:
# Default (recommended)
node index.js
# Remove everything except specific items
--htmlclean-remove all --htmlclean-keep id,class
# Custom cleaning
--htmlclean-remove comments,script,style,svg,img,longtextAvailable items: comments, script, style, svg, img, inlinestyle, attributes, longtext, all
βββββββββββββββ
β index.js β Entry point & CLI
ββββββββ¬βββββββ
β
βββΊ ConfigManager Load settings & steps
βββΊ CodeGenerator Generate Playwright code via AI
βββΊ TestExecutor Execute generated code
βββΊ RetryManager Handle retry logic
βββΊ TestReporter Log results & analytics
βββΊ TestRunner Orchestrate execution
-
Initialization
- Parse CLI arguments
- Load configuration from settings file
- Initialize OpenAI client (or MockOpenAI)
- Configure components with retry/cache strategy
-
Step Preparation
- Read steps from JSON file
- Generate unique hash ID for each step
- Validate cache availability (for
onlycachemode)
-
Browser Launch
- Launch Chromium via Playwright
- Navigate to entry point URL
- Wait for initial page load
-
Step Execution Loop
For each step:
a) Cache Lookup (if enabled):
const cachePath = `./generated/aidriven/step-${hash}.js`; // If found β use cached code, skip API call
b) AI Code Generation (if cache miss):
// Extract clean HTML from page const html = await executor.extractCleanHtml(page); // Generate code via GPT-4o const code = await codeGenerator.generate(step, { taskDescription: step.subPrompt, url: page.url(), html: cleanedHtml, errorMessage: previousError // On retry });
c) Code Execution:
// Wrap in async function with page & expect const fn = eval(`(async (page, expect) => { ${code} })`); await fn(page, expect);
d) Retry Logic:
- If failed and attempts remaining: retry with error context
- High strength: includes previous error in prompt
- Updates token usage counters
e) Post-Step Actions:
- Log results
- Wait for configured timeout
- Proceed to next step
-
Completion
- Close browser
- Calculate total usage (tokens + cost)
- Save execution log with analytics
- Update steps file with generated IDs
// Step ID generation (MD5 hash of prompt)
this.id = crypto.createHash("md5")
.update(subPrompt)
.digest("hex")
.slice(0, 8);
// Cache validation (onlycache mode)
if (missingCache.length > 0) {
console.error("β Cache mancante");
process.exit(1);
}Cache benefits:
- Zero API cost on reruns
- Deterministic test behavior
- Faster execution (no AI generation)
Organize tests into isolated, reusable suites with their own configuration.
stepspacks/
βββ my-test-suite/
β βββ .env # Optional: API keys for this pack
β βββ settings.json # Pack-specific settings
β βββ steps.json # Test steps
β βββ media/ # Assets (images, files)
β βββ generated/ # Cached code & reports
β βββ step-*.js
β βββ run-logs.json
# Create pack directory
mkdir -p stepspacks/login-flow
# Create settings
cat > stepspacks/login-flow/settings.json << 'EOF'
{
"execution": {
"entrypoint_url": "https://myapp.com/login",
"headless": false
},
"ai_agent": {
"type": "gpt-4o",
"endpoint": "https://your-endpoint.openai.azure.com/...",
"cost_input_token": "0.000005",
"cost_output_token": "0.00002",
"cost_cached_token": "0.0000025"
}
}
EOF
# Create steps
cat > stepspacks/login-flow/steps.json << 'EOF'
{
"steps": [
{
"sub_prompt": "Enter email user@example.com",
"timeout": "3000"
},
{
"sub_prompt": "Enter password SecurePass123 and click login",
"timeout": "5000"
}
]
}
EOF# Execute specific pack
node index.js --stepspack login-flow --strength medium
# With HTML report
node index.js --stepspack login-flow --html-report
# Available packs
ls stepspacks/
# test-livrea change-image-livrea guru-valutazβ
Isolated environments (separate cache, reports, configs)
β
Reusable across projects
β
Easy to share and version control
β
Pack-specific API keys via .env
aidriven-steps.json:
{
"steps": [
{
"sub_prompt": "Wait for page load, click on the login link in header",
"timeout": "3000"
},
{
"sub_prompt": "Fill email with user@example.com and password with SecurePass123!",
"timeout": "2000"
},
{
"sub_prompt": "Click the login submit button",
"timeout": "5000"
},
{
"sub_prompt": "Verify welcome message contains 'Hello, User'",
"timeout": "3000"
}
]
}Execution:
# First run (generates cache)
node index.js --strength medium
# Subsequent runs (uses cache, $0.00 cost)
node index.js --strength onlycache{
"steps": [
{
"sub_prompt": "Navigate to dropdown Analysis > Smart compare",
"timeout": "5000"
},
{
"sub_prompt": "Select date range 'Last 30 days' from filter",
"timeout": "3000"
},
{
"sub_prompt": "If the export button is disabled, throw error 'Test failed: Export unavailable', otherwise click it",
"timeout": "8000"
},
{
"sub_prompt": "Wait for download notification to appear",
"timeout": "5000"
}
]
}High reliability run:
node index.js --strength high --stop-on-error{
"steps": [
{
"sub_prompt": "Click the three dots menu in profile section",
"timeout": "3000"
},
{
"sub_prompt": "Click edit photo button with id #btn_modifica_foto",
"timeout": "4000"
},
{
"sub_prompt": "Click choose file and select /path/to/image.png, wait 3 seconds, then click the enabled save button",
"timeout": "15000"
},
{
"sub_prompt": "Verify success message appears",
"timeout": "5000"
}
]
}{
"steps": [
{
"sub_prompt": "If banner is visible, accept it by clicking Accept button",
"timeout": "3000"
},
{
"sub_prompt": "If text 'Non sono presenti funzioni extra' is found, throw error 'Test failed: No extra features available', otherwise click QUESTIONARIO ONLINE",
"timeout": "10000"
}
]
}# First run (generates cache)
node index.js --strength medium
# All subsequent runs (zero cost)
node index.js --strength onlycacheSavings: 100% cost reduction on reruns
- Default
--strength mediumbalances cost and reliability (2 attempts) - Only use
--strength high(3 attempts) for problematic workflows - Reserve
--strength onlycachefor stable tests
Remove unnecessary HTML to reduce token usage:
# Aggressive cleaning (smallest context)
node index.js --htmlclean-remove all --htmlclean-keep id,class,data-testid
# Default (balanced)
node index.js --htmlclean-remove comments,script,style,svg,img,longtextCheck run-logs.json after execution:
{
"usage": {
"total_tokens": 12450,
"input_tokens": 10000,
"output_tokens": 2000,
"cached_tokens": 8500,
"calculated_cost": 0.0375
}
}Cached tokens are 50% cheaper - Azure OpenAI automatically caches repeated prompt content.
β Good: Concise and specific
{
"sub_prompt": "Click login button with id #btn_login"
}β Bad: Overly verbose
{
"sub_prompt": "Please locate the login button on the page, which should be somewhere near the top of the form, and when you find it, click on it to proceed to the next step"
}Scenario: 10-step workflow, ~1000 tokens per step
| Mode | API Calls | Total Tokens | Cached | Cost |
|---|---|---|---|---|
onlycache |
0 | 0 | N/A | $0.00 β¨ |
medium (all cached) |
0 | 0 | N/A | $0.00 β¨ |
medium (no cache) |
10 | 20,000 | 5,000 | $0.40 |
high (2 retries) |
12 | 24,000 | 6,000 | $0.48 |
Recommendation: Always run medium first to build cache, then use onlycache for CI/CD pipelines.
β ERRORE: Cache mancante per i seguenti step:
- Step 1: "Click login button"
File atteso: ./generated/aidriven/step-aa9c1054.js
π‘ Suggerimento: Esegui prima con --strength medium o --strength highSolution:
# Generate cache first
node index.js --strength medium
# Then use onlycache
node index.js --strength onlycacheCauses:
- Page not fully loaded
- Dynamic selectors changed
- Element is hidden/inactive
Solutions:
-
Increase
timeoutvalue:{ "sub_prompt": "Click submit button", "timeout": "10000" // Increase from 5000 } -
Try
--strength highfor retry with error context:node index.js --strength high
-
Clear cache if page structure changed:
node index.js --nocache --strength medium
-
Check generated code:
cat ./generated/aidriven/step-{hash}.js
Check:
- Verify
cost_*_tokenvalues in settings - Review
run-logs.jsonfor token breakdown - Ensure cached tokens counted correctly
--strength onlycache e --nocache sono opzioni incompatibiliSolution: Don't combine --strength onlycache with --nocache
This indicates the AI deliberately threw an error based on your conditional logic:
{
"sub_prompt": "If error message appears, throw error 'Test failed: Login unsuccessful'"
}This is expected behavior - check your step conditions.
-
Enable Mock Mode (no API costs):
node index.js --mock
-
Check Generated Code:
cat ./generated/aidriven/step-{hash}.js -
Review Execution Log:
cat ./generated/aidriven/run-logs.json | jq '.runs[-1]'
-
Run in Headed Mode:
{ "execution": { "headless": false } } -
Force Fresh Generation:
node index.js --nocache --strength high
-
Inspect HTML Sent to AI:
cat ./generated/aidriven/debug/post-clean/1.html
-
Never commit
.envfile with real API keys -
Use
.gitignoreto exclude sensitive files (already configured) -
Avoid hardcoding credentials in step prompts:
// β Bad { "sub_prompt": "Login with password: MySecret123" } // β Better { "sub_prompt": "Login with credentials from environment" }
-
Review generated code before running on production systems
-
Use headless mode cautiously on public sites
-
Sanitize logs before sharing (may contain sensitive selectors)
-
Rotate API keys regularly
-
Use StepsPack
.envfiles for isolated credentials
Contributions are welcome! Areas for improvement:
- Environment variable injection in prompts (
${PROCESS.ENV.USERNAME}) - Parallel step execution for independent tests
- Visual regression testing integration
- Multiple browser support (Firefox, Safari, WebKit)
- Web UI for step configuration
- CI/CD integration examples (GitHub Actions, GitLab CI)
- Step dependency system (
depends_on: ["step-1", "step-2"]) - Conditional step execution based on results
- Screenshot capture on failure
- Video recording of test runs
- Playwright trace integration
- Fork the repository
- Create feature branch:
git checkout -b feature/amazing-feature
- Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature
- Open Pull Request with detailed description
# Install dependencies
npm install
# Run linter
npm run lint
# Run tests
npm testThis project is licensed under the ISC License - see LICENSE file for details.
- Playwright - Reliable browser automation framework
- OpenAI GPT-4o - AI code generation capabilities
- Azure OpenAI - Enterprise AI service
- Commander.js - CLI argument parsing
- JSDOM - HTML parsing and cleaning <<<<<<< HEAD
For issues, feature requests, or questions:
- π§ Open an issue on GitHub
- π¬ Check existing issues for solutions
- π Review this README and inline documentation