feat: initial release of pytest-codingagents#1
Merged
Conversation
added 11 commits
February 11, 2026 08:44
- Move agent creation inside test function (tmp_path scope) - Fix config table: name defaults to 'copilot', mcp_servers is dict, custom_agents/skill_directories/disabled_skills are lists with defaults - Add missing fields: system_message_mode, extra_config - Add reasoning_effort 'xhigh' option - Add Authentication section (GITHUB_TOKEN + gh CLI) - Add CopilotResult properties reference table - Add MCP server and tool control examples - Remove ... placeholder code, use complete examples - Fix same bug in docs/index.md
The old framing ('the agent is the test harness, not the thing being
tested') was copied from pytest-aitest. That's the opposite of this
project's purpose: pytest-codingagents evaluates coding agents — which
model works best, do instructions improve quality, are MCP tools used
correctly.
- Use snake_case keys in build_session_config() (SDK TypedDict, not camelCase) - Handle ToolRequest objects in events.py (not plain dicts) - Fix permission handler signature and return value - Replace bogus SDK cost field with litellm model_cost computation - Add azure-identity dependency for Azure AD authentication - Add _lookup_model_cost() with name normalization (dot-to-dash for Claude) - Update unit tests for new cost computation
- Add coding_agent_analysis.md prompt template for AI insights
- Implement pytest_aitest_analysis_prompt hook in plugin.py
- Build pricing table dynamically from litellm model_cost at runtime
- Replace hardcoded tier table with {{PRICING_TABLE}} placeholder
- Add test_plugin.py unit tests for hook and pricing table
- Parametrize tests across gpt-5.2 and claude-opus-4.5 - Add conftest.py with MODELS list and copilot marker - Add CLI tools and skills integration test stubs
- Fix CopilotResult fields table (usage not token_usage, cost_usd is property) - Fix Turn fields (remove phantom reasoning field) - Fix ToolCall fields (add error, duration_ms, tool_call_id; fix arguments type) - Fix UsageInfo fields (add cache_read_tokens, cost_usd, duration_ms) - Fix SubagentInvocation fields (name/status/duration_ms, not agent_name/prompt/result) - Add system_message_mode to configuration reference - Add prompts/ directory to contributing project structure - Update hooks section to describe dynamic pricing - Update aitest integration doc with dynamic pricing bullet
- reference/result.md: add missing SubagentInvocation class - reference/configuration.md: reasoning_effort type Literal, not str - README.md: raw_events type list[Any], not list[dict]
… reports Core changes: - Increase default timeout_s from 120s to 300s - Relax test assertions: accept powershell/run_in_terminal, use rglob - Rewrite coding_agent_analysis.md prompt for visual impact (tables, scorecards) - Remove unused azure-identity dependency - Remove addopts from pyproject.toml (per-file reports via run_all.py) Docs & demo: - Add demo reports (basic, model-comparison, instruction-testing) - Link demo reports from README, docs index, aitest integration page - Add Demo Reports to mkdocs nav - Update all docs to reflect timeout_s=300.0 - Add run_all.py to contributing docs Tooling: - Add scripts/run_all.py for per-file report generation - Update tests/README.md with accurate test descriptions
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
A pytest plugin for testing coding agents via their native SDKs. Tests run against the real Copilot CLI — no mocks, no wrappers.
What's Included
Core Plugin
pytest-aitest Integration
Integration Tests (32 tests across 8 files)
Unit Tests (37 tests)
Pure logic tests for agent config, event mapping, result properties, and plugin hooks.
Documentation
Tooling
Key Design Decisions
un_in_terminal, use
glob\ for file search
un_all.py\ instead