feat: initial release of pytest-codingagents by sbroenne · Pull Request #1 · sbroenne/pytest-codingagents

sbroenne · 2026-02-11T23:24:38Z

Summary

A pytest plugin for testing coding agents via their native SDKs. Tests run against the real Copilot CLI — no mocks, no wrappers.

What's Included

Core Plugin

*\CopilotAgent* dataclass — model, instructions, tools, skills, custom agents, MCP servers
*\CopilotResult* — full observability: tool calls, token usage, cost, reasoning traces, subagent invocations
*\copilot_run* fixture — execute prompts and capture structured results
EventMapper — maps all 38 SDK event types to structured data
Cost computation via litellm pricing (SDK's cost field is unreliable)
Auto-confirm permissions for deterministic testing

pytest-aitest Integration

Automatic bridging of \CopilotResult\ → \AgentResult\ for HTML reports
Custom \pytest_aitest_analysis_prompt\ hook with coding-agent-specific framing
Dynamic pricing table injected from litellm's \model_cost\ data

Integration Tests (32 tests across 8 files)

File	What It Tests
\ est_basic.py\	File creation, code quality, refactoring
\ est_models.py\	Model comparison (GPT-5.2 vs Claude Opus 4.5)
\ est_matrix.py\	Model × Instructions grid
\ est_instructions.py\	System prompt variants and constraints
\ est_cli_tools.py\	Terminal commands, git operations
\ est_custom_agents.py\	Custom agent delegation
\ est_events.py\	Reasoning traces, permissions, usage tracking
\ est_skills.py\	Skill directories, disabled skills

Unit Tests (37 tests)

Pure logic tests for agent config, event mapping, result properties, and plugin hooks.

Documentation

Full README with examples for every feature
mkdocs site with Getting Started, How-To, Reference, Contributing
3 demo HTML reports linked from docs

Tooling

\scripts/run_all.py\ — per-file report generation
Pre-commit hooks: ruff lint/format + pyright type checking

Key Design Decisions

timeout_s = 300s — coding agent tasks need more time than simple tool calls
Relaxed assertions — accept \powershell\ or
un_in_terminal, use
glob\ for file search
No azure-identity dep — not used in the codebase
addopts removed from pyproject.toml — per-file reports via
un_all.py\ instead

- Move agent creation inside test function (tmp_path scope) - Fix config table: name defaults to 'copilot', mcp_servers is dict, custom_agents/skill_directories/disabled_skills are lists with defaults - Add missing fields: system_message_mode, extra_config - Add reasoning_effort 'xhigh' option - Add Authentication section (GITHUB_TOKEN + gh CLI) - Add CopilotResult properties reference table - Add MCP server and tool control examples - Remove ... placeholder code, use complete examples - Fix same bug in docs/index.md

The old framing ('the agent is the test harness, not the thing being tested') was copied from pytest-aitest. That's the opposite of this project's purpose: pytest-codingagents evaluates coding agents — which model works best, do instructions improve quality, are MCP tools used correctly.

- Use snake_case keys in build_session_config() (SDK TypedDict, not camelCase) - Handle ToolRequest objects in events.py (not plain dicts) - Fix permission handler signature and return value - Replace bogus SDK cost field with litellm model_cost computation - Add azure-identity dependency for Azure AD authentication - Add _lookup_model_cost() with name normalization (dot-to-dash for Claude) - Update unit tests for new cost computation

- Add coding_agent_analysis.md prompt template for AI insights - Implement pytest_aitest_analysis_prompt hook in plugin.py - Build pricing table dynamically from litellm model_cost at runtime - Replace hardcoded tier table with {{PRICING_TABLE}} placeholder - Add test_plugin.py unit tests for hook and pricing table

- Parametrize tests across gpt-5.2 and claude-opus-4.5 - Add conftest.py with MODELS list and copilot marker - Add CLI tools and skills integration test stubs

- Fix CopilotResult fields table (usage not token_usage, cost_usd is property) - Fix Turn fields (remove phantom reasoning field) - Fix ToolCall fields (add error, duration_ms, tool_call_id; fix arguments type) - Fix UsageInfo fields (add cache_read_tokens, cost_usd, duration_ms) - Fix SubagentInvocation fields (name/status/duration_ms, not agent_name/prompt/result) - Add system_message_mode to configuration reference - Add prompts/ directory to contributing project structure - Update hooks section to describe dynamic pricing - Update aitest integration doc with dynamic pricing bullet

…ort)

- reference/result.md: add missing SubagentInvocation class - reference/configuration.md: reasoning_effort type Literal, not str - README.md: raw_events type list[Any], not list[dict]

…atrix.py

… reports Core changes: - Increase default timeout_s from 120s to 300s - Relax test assertions: accept powershell/run_in_terminal, use rglob - Rewrite coding_agent_analysis.md prompt for visual impact (tables, scorecards) - Remove unused azure-identity dependency - Remove addopts from pyproject.toml (per-file reports via run_all.py) Docs & demo: - Add demo reports (basic, model-comparison, instruction-testing) - Link demo reports from README, docs index, aitest integration page - Add Demo Reports to mkdocs nav - Update all docs to reflect timeout_s=300.0 - Add run_all.py to contributing docs Tooling: - Add scripts/run_all.py for per-file report generation - Update tests/README.md with accurate test descriptions

github-actions · 2026-02-11T23:24:47Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Stefan Broenner added 11 commits February 11, 2026 08:44

test: integration tests with model parametrization

61251ee

- Parametrize tests across gpt-5.2 and claude-opus-4.5 - Add conftest.py with MODELS list and copilot marker - Add CLI tools and skills integration test stubs

chore: bump pytest-aitest minimum to 0.2.8 (analysis prompt hook supp…

4956448

…ort)

docs: fix 3 minor misalignments with implementation

61fadff

- reference/result.md: add missing SubagentInvocation class - reference/configuration.md: reasoning_effort type Literal, not str - README.md: raw_events type list[Any], not list[dict]

chore: add aitest report defaults to pyproject.toml

937b476

docs: add tests README with matrix, fix model inconsistency in test_m…

30d0241

…atrix.py

sbroenne merged commit 169b318 into main Feb 11, 2026
9 checks passed

sbroenne deleted the initial-release branch February 11, 2026 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: initial release of pytest-codingagents#1

feat: initial release of pytest-codingagents#1
sbroenne merged 11 commits intomainfrom
initial-release

sbroenne commented Feb 11, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sbroenne commented Feb 11, 2026

Summary

What's Included

Core Plugin

pytest-aitest Integration

Integration Tests (32 tests across 8 files)

Unit Tests (37 tests)

Documentation

Tooling

Key Design Decisions

Uh oh!

github-actions bot commented Feb 11, 2026

Dependency Review

Scanned Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant