sbroenne · sbroenne · Feb 12, 2026 · Feb 12, 2026
@@ -8,7 +8,7 @@ Live HTML reports generated by pytest-codingagents with [pytest-aitest](https://
 |--------|-------------|
 | [Basic Report](basic-report.html) | Core file operations — create modules, refactor code |
 | [Model Comparison](model-comparison-report.html) | Same tasks across different models (GPT-5.2 vs Claude Opus 4.5) |
-| [Instruction Testing](instruction-testing-report.html) | How different system prompts affect agent behavior |
+| [Instruction Testing](instruction-testing-report.html) | How different instructions affect agent behavior |
 
 ## How These Are Generated
 

@@ -40,4 +40,4 @@ async def test_hello_world(copilot_run, tmp_path):
 ## What's Next
 
 - [Model Comparison](model-comparison.md) — Compare different models
-- [Instruction Testing](instruction-testing.md) — Test different system prompts
+- [Instruction Testing](instruction-testing.md) — Test different instructions
@@ -1,6 +1,6 @@
 # Instruction Testing
 
-Test how different system prompts affect agent behavior.
+Test how different instructions affect agent behavior.
 
 ## Example
 
@@ -31,5 +31,5 @@ async def test_coding_style(copilot_run, tmp_path, style, instructions):
 ## What To Look For
 
 - **Do instructions change behavior?** Compare output files across styles.
-- **Token efficiency** — Verbose prompts may cost more but produce better results.
+- **Token efficiency** — Verbose instructions may cost more but produce better results.
 - **Tool patterns** — Does TDD-style actually write tests first?
@@ -1,18 +1,12 @@
 # pytest-aitest Integration
 
-Get HTML reports with AI-powered analysis by integrating with [pytest-aitest](https://github.com/sbroenne/pytest-aitest).
+HTML reports with AI-powered analysis are included automatically — [pytest-aitest](https://github.com/sbroenne/pytest-aitest) is a core dependency.
 
 > **See example reports:** [Basic Report](../demo/basic-report.html) · [Model Comparison](../demo/model-comparison-report.html) · [Instruction Testing](../demo/instruction-testing-report.html)
 
-## Installation
-
-```bash
-uv add "pytest-codingagents[aitest]"
-```
-
 ## How It Works
 
-When `pytest-aitest` is installed, `CopilotResult` automatically bridges to `AgentResult`, enabling:
+When tests run, `CopilotResult` automatically bridges to `AgentResult`, enabling:
 
 - **HTML reports** with test results, tool call details, and Mermaid sequence diagrams
 - **AI analysis** with failure root causes and improvement suggestions tailored for coding agents
@@ -27,4 +21,23 @@ Use pytest-aitest's standard CLI options:
 uv run pytest tests/ --aitest-html=report.html --aitest-summary-model=azure/gpt-5-mini
 ```
 
+Or configure in `pyproject.toml`:
+
+```toml
+[tool.pytest.ini_options]
+addopts = """
+    --aitest-html=aitest-reports/report.html
+    --aitest-summary-model=azure/gpt-5.2-chat
+"""
+```
+
 No code changes needed — the integration is automatic via the plugin system.
+
+## Analysis Prompt Hook
+
+The plugin implements the `pytest_aitest_analysis_prompt` hook to inject Copilot-specific context into AI analysis:
+
+- **Coding-agent framing** — the AI analyzer understands it's evaluating models, instructions, and tools (not MCP servers)
+- **Dynamic pricing table** — model pricing data is pulled live from litellm's `model_cost` database, so cost analysis stays current without manual updates
+
+This happens automatically — no configuration needed.
@@ -0,0 +1,96 @@
+# CLI Tool Testing
+
+Test that GitHub Copilot can operate command-line tools correctly.
+
+## Basic Usage
+
+Give the agent a task that requires CLI tools and verify the outcome:
+
+```python
+from pytest_codingagents import CopilotAgent
+
+
+async def test_git_init(copilot_run, tmp_path):
+    agent = CopilotAgent(
+        instructions="Use git commands as requested.",
+        working_directory=str(tmp_path),
+    )
+    result = await copilot_run(
+        agent,
+        "Initialize a git repo, create a .gitignore for Python, and make an initial commit.",
+    )
+    assert result.success
+    assert (tmp_path / ".git").is_dir()
+    assert (tmp_path / ".gitignore").exists()
+```
+
+## Verifying File Output
+
+Check that CLI operations produce the expected files:
+
+```python
+async def test_project_scaffold(copilot_run, tmp_path):
+    agent = CopilotAgent(
+        instructions="Create project structures as requested.",
+        working_directory=str(tmp_path),
+    )
+    result = await copilot_run(
+        agent,
+        "Create a Python package called 'mylib' with __init__.py, "
+        "a pyproject.toml using hatchling, and a tests/ directory.",
+    )
+    assert result.success
+    assert (tmp_path / "src" / "mylib" / "__init__.py").exists() or (
+        tmp_path / "mylib" / "__init__.py"
+    ).exists()
+    assert (tmp_path / "pyproject.toml").exists()
+```
+
+## Testing Complex Workflows
+
+Chain multiple CLI operations into a single task:
+
+```python
+async def test_git_workflow(copilot_run, tmp_path):
+    agent = CopilotAgent(
+        instructions="Perform git operations as requested. Use git commands directly.",
+        working_directory=str(tmp_path),
+    )
+    result = await copilot_run(
+        agent,
+        "Initialize a git repo, create hello.py with print('hello'), "
+        "add it, commit with message 'initial', then create a 'feature' branch.",
+    )
+    assert result.success
+    assert (tmp_path / "hello.py").exists()
+```
+
+## Comparing Instructions for CLI Tasks
+
+Test which instructions produce better CLI usage:
+
+```python
+import pytest
+from pytest_codingagents import CopilotAgent
+
+
+@pytest.mark.parametrize(
+    "style,instructions",
+    [
+        ("minimal", "Execute commands as requested."),
+        ("guided", "You are a DevOps assistant. Use standard CLI tools. "
+         "Always verify operations succeed before proceeding."),
+    ],
+)
+async def test_cli_instructions(copilot_run, tmp_path, style, instructions):
+    agent = CopilotAgent(
+        name=f"cli-{style}",
+        instructions=instructions,
+        working_directory=str(tmp_path),
+    )
+    result = await copilot_run(
+        agent,
+        "Create a Python virtual environment and install requests",
+    )
+    assert result.success
+```
@@ -2,5 +2,9 @@
 
 Practical guides for common tasks.
 
-- [pytest-aitest Integration](aitest-integration.md) — Get HTML reports with AI analysis
+- [MCP Server Testing](mcp-servers.md) — Test that the agent uses your custom tools
+- [Skill Testing](skills.md) — Measure the impact of domain knowledge
+- [CLI Tool Testing](cli-tools.md) — Verify the agent operates CLI tools correctly
+- [Tool Control](tool-control.md) — Restrict tools with allowlists and blocklists
+- [pytest-aitest Integration](aitest-integration.md) — HTML reports with AI analysis
 - [CI/CD Integration](ci-cd.md) — Run tests in GitHub Actions
@@ -0,0 +1,128 @@
+# MCP Server Testing
+
+Test that GitHub Copilot can discover and use your MCP server tools correctly.
+
+## Basic Usage
+
+Attach an MCP server to a `CopilotAgent` and verify the agent calls the right tools:
+
+```python
+from pytest_codingagents import CopilotAgent
+
+
+async def test_database_query(copilot_run, tmp_path):
+    agent = CopilotAgent(
+        instructions="Use the database tools to answer questions.",
+        working_directory=str(tmp_path),
+        mcp_servers={
+            "my-db-server": {
+                "command": "python",
+                "args": ["-m", "my_db_mcp_server"],
+            }
+        },
+    )
+    result = await copilot_run(agent, "List all users in the database")
+    assert result.success
+    assert result.tool_was_called("list_users")
+```
+
+## Multiple Servers
+
+Attach multiple MCP servers to test interactions between tools:
+
+```python
+async def test_multi_server(copilot_run, tmp_path):
+    agent = CopilotAgent(
+        instructions="Use the available tools to complete tasks.",
+        working_directory=str(tmp_path),
+        mcp_servers={
+            "database": {
+                "command": "python",
+                "args": ["-m", "db_server"],
+            },
+            "notifications": {
+                "command": "node",
+                "args": ["notification_server.js"],
+            },
+        },
+    )
+    result = await copilot_run(
+        agent,
+        "Find users who signed up today and send them a welcome notification",
+    )
+    assert result.success
+    assert result.tool_was_called("query_users")
+    assert result.tool_was_called("send_notification")
+```
+
+## A/B Server Comparison
+
+Compare two versions of the same MCP server to validate improvements:
+
+```python
+import pytest
+from pytest_codingagents import CopilotAgent
+
+SERVER_VERSIONS = {
+    "v1": {"command": "python", "args": ["-m", "my_server_v1"]},
+    "v2": {"command": "python", "args": ["-m", "my_server_v2"]},
+}
+
+
+@pytest.mark.parametrize("version", SERVER_VERSIONS.keys())
+async def test_server_version(copilot_run, tmp_path, version):
+    agent = CopilotAgent(
+        name=f"server-{version}",
+        instructions="Use the available tools to answer questions.",
+        working_directory=str(tmp_path),
+        mcp_servers={"my-server": SERVER_VERSIONS[version]},
+    )
+    result = await copilot_run(agent, "What's the current inventory count?")
+    assert result.success
+    assert result.tool_was_called("get_inventory")
+```
+
+The AI analysis report will compare pass rates and tool usage across server versions, highlighting which performs better.
+
+## Verifying Tool Arguments
+
+Check not just that a tool was called, but how it was called:
+
+```python
+async def test_correct_arguments(copilot_run, tmp_path):
+    agent = CopilotAgent(
+        instructions="Use database tools to query data.",
+        working_directory=str(tmp_path),
+        mcp_servers={
+            "db": {"command": "python", "args": ["-m", "db_server"]},
+        },
+    )
+    result = await copilot_run(agent, "Find users named Alice")
+    assert result.success
+
+    # Check specific tool calls
+    calls = result.tool_calls_for("query_users")
+    assert len(calls) >= 1
+    assert "Alice" in str(calls[0].arguments)
+```
+
+## Environment Variables
+
+Pass environment variables to your MCP server process:
+
+```python
+agent = CopilotAgent(
+    instructions="Use the API tools.",
+    working_directory=str(tmp_path),
+    mcp_servers={
+        "api-server": {
+            "command": "python",
+            "args": ["-m", "api_server"],
+            "env": {
+                "API_KEY": "test-key",
+                "DATABASE_URL": "sqlite:///test.db",
+            },
+        }
+    },
+)
+```