Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Every test run produces an HTML report with AI-powered insights:
- **Diagnoses failures** — root cause analysis with suggested fixes
- **Compares models** — leaderboards ranked by pass rate and cost
- **Evaluates instructions** — which instructions produce better results
- **Recommends improvements** — actionable changes to tools, prompts, and skills
- **Recommends improvements** — actionable changes to tools, instructions, and skills

```bash
uv run pytest tests/ --aitest-html=report.html --aitest-summary-model=azure/gpt-5.2-chat
Expand Down
2 changes: 1 addition & 1 deletion docs/how-to/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

Practical guides for common tasks.

- [MCP Server Testing](mcp-servers.md) — Test that the agent uses your custom tools
- [Skill Testing](skills.md) — Measure the impact of domain knowledge
- [MCP Server Testing](mcp-servers.md) — Test that the agent uses your custom tools
- [CLI Tool Testing](cli-tools.md) — Verify the agent operates CLI tools correctly
- [Tool Control](tool-control.md) — Restrict tools with allowlists and blocklists
- [pytest-aitest Integration](aitest-integration.md) — HTML reports with AI analysis
Expand Down
23 changes: 17 additions & 6 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,14 @@
# pytest-codingagents

Automated testing for GitHub Copilot configurations. Test your instructions, MCP servers, skills, and models — then get AI analysis that tells you **why** things failed and **what to fix**.
**Combatting cargo cult programming in Agent Instructions, Skills, and Custom Agents for GitHub Copilot and other coding agents since 2026.**

Everyone's copying instruction files from blog posts, pasting "you are a senior engineer" into agent configs, and adding skills they found on Reddit. But does any of it actually work? Are your instructions making your coding agent better — or just longer? Is that skill helping, or is the agent ignoring it entirely?

**You don't know, because you're not testing it.**

pytest-codingagents is a pytest plugin that runs your actual coding agent configuration against real tasks — then uses AI analysis to tell you **why** things failed and **what to fix**.

Currently supports **GitHub Copilot** via [copilot-sdk](https://www.npmjs.com/package/github-copilot-sdk). More agents (Claude Code, etc.) coming soon.

```python
from pytest_codingagents import CopilotAgent
Expand All @@ -27,10 +35,11 @@ Authenticate via `GITHUB_TOKEN` env var (CI) or `gh auth status` (local).

| Capability | What it proves | Guide |
|---|---|---|
| **Instructions** | Custom instructions produce the desired behavior | [Getting Started](getting-started/index.md) |
| **Instructions** | Your custom instructions actually produce the desired behavior — not just vibes | [Getting Started](getting-started/index.md) |
| **Skills** | That domain knowledge file is helping, not being ignored | [Skill Testing](how-to/skills.md) |
| **Models** | Which model works best for your use case and budget | [Model Comparison](getting-started/model-comparison.md) |
| **Custom Agents** | Your custom agent configurations actually work as intended | [Getting Started](getting-started/index.md) |
| **MCP Servers** | The agent discovers and uses your custom tools | [MCP Server Testing](how-to/mcp-servers.md) |
| **Skills** | Domain knowledge improves agent performance | [Skill Testing](how-to/skills.md) |
| **CLI Tools** | The agent operates command-line interfaces correctly | [CLI Tool Testing](how-to/cli-tools.md) |
| **Tool Control** | Allowlists and blocklists restrict tool usage | [Tool Control](how-to/tool-control.md) |

Expand All @@ -43,15 +52,17 @@ Every test run produces an HTML report with AI-powered insights:
- **Diagnoses failures** — root cause analysis with suggested fixes
- **Compares models** — leaderboards ranked by pass rate and cost
- **Evaluates instructions** — which instructions produce better results
- **Recommends improvements** — actionable changes to tools, prompts, and skills
- **Recommends improvements** — actionable changes to tools, instructions, and skills

```bash
uv run pytest tests/ --aitest-html=report.html --aitest-summary-model=azure/gpt-5.2-chat
```

## Next Steps
## Documentation

Full docs at **[sbroenne.github.io/pytest-codingagents](https://sbroenne.github.io/pytest-codingagents/)** — API reference, how-to guides, and demo reports.

- [Getting Started](getting-started/index.md) — Install and write your first test
- [How-To Guides](how-to/index.md) — MCP servers, skills, CLI tools, and more
- [How-To Guides](how-to/index.md) — Skills, MCP servers, CLI tools, and more
- [Demo Reports](demo/index.md) — See real HTML reports with AI analysis
- [API Reference](reference/api.md) — Full API documentation
4 changes: 2 additions & 2 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
site_name: pytest-codingagents
site_description: "A pytest plugin for testing coding agents via their native SDKs. The agent is the test harness, not the thing being tested."
site_description: "A pytest plugin for testing coding agent configurations — instructions, skills, custom agents, and models — against real tasks."
site_url: https://sbroenne.github.io/pytest-codingagents
site_author: sbroenne
repo_url: https://github.com/sbroenne/pytest-codingagents
Expand Down Expand Up @@ -76,8 +76,8 @@ nav:
- Instruction Testing: getting-started/instruction-testing.md
- How-To Guides:
- Overview: how-to/index.md
- MCP Server Testing: how-to/mcp-servers.md
- Skill Testing: how-to/skills.md
- MCP Server Testing: how-to/mcp-servers.md
- CLI Tool Testing: how-to/cli-tools.md
- Tool Control: how-to/tool-control.md
- pytest-aitest Integration: how-to/aitest-integration.md
Expand Down
Loading