feat: add pytest AI validation and Robot Framework BDD testing samples#1
feat: add pytest AI validation and Robot Framework BDD testing samples#1Michspirit99 wants to merge 5 commits intomainfrom
Conversation
- Reposition as THE comprehensive Python SDK sample collection - Highlight E2E proof (15/15 passing) as key differentiator - Better comparison table vs typical SDK examples - Clearer value proposition upfront - Add CI status and E2E badges - Emphasize free tier (gpt-5-mini) compatibility - Reorganize samples into clear categories - Add acknowledgments for GitHub Copilot SDK
- pytest_ai_validation.py: AI-enhanced pytest with 4 test scenarios (code generation, bug detection, structured JSON, AI-as-judge) - robot_copilot_library.py: Robot Framework keyword library wrapping Copilot SDK + standalone BDD runner (3 Gherkin scenarios) - copilot_bdd.robot: BDD test suite with Given/When/Then syntax for AI agent testing (code gen, code review, JSON output, explanations) - Update requirements.txt with pytest, pytest-asyncio, robotframework - Update README: 15 -> 17 samples, add AI-Enhanced Testing section - Update E2E runner to include new samples in scenario suite
There was a problem hiding this comment.
Pull request overview
Adds new “AI-enhanced testing” sample scripts showing how to integrate Copilot SDK–powered agent validation into established test frameworks (pytest + Robot Framework BDD), and wires them into docs and the E2E scenario runner.
Changes:
- Added a pytest-based AI validation sample with reusable async scenarios and optional pytest integration.
- Added a Robot Framework keyword library +
.robotBDD suite demonstrating Given/When/Then AI testing. - Updated requirements, README catalog/claims, and the E2E scenario runner to account for the new samples.
Reviewed changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
scripts/run_agent_scenarios.py |
Adds reporting of .robot files to the E2E scenario summary. |
samples/robot_copilot_library.py |
New Robot Framework keyword library + standalone BDD-style runner. |
samples/pytest_ai_validation.py |
New pytest AI-validation sample + standalone runner. |
samples/copilot_bdd.robot |
New Robot Framework BDD suite consuming the keyword library. |
requirements.txt |
Adds pytest/pytest-asyncio/robotframework dependencies. |
README.md |
Updates sample catalog and adds AI-Enhanced Testing section + badges/claims. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
samples/pytest_ai_validation.py
Outdated
| import ast | ||
| import json | ||
| import re | ||
| import sys |
There was a problem hiding this comment.
sys is imported but never used, which will fail the repo's ruff check CI step. Remove the unused import (or use it explicitly if needed).
| import sys |
| def json_should_be_valid(self) -> dict: | ||
| """Assert that the last response is valid JSON and return the parsed dict.""" | ||
| try: | ||
| data = json.loads(self._last_response) | ||
| return data | ||
| except json.JSONDecodeError as e: | ||
| raise AssertionError( | ||
| f"Invalid JSON: {e}\n\nRaw:\n{self._last_response[:300]}" | ||
| ) | ||
|
|
||
| def json_should_have_keys(self, *keys): | ||
| """Assert that the parsed JSON contains all specified keys. | ||
|
|
||
| Example (Robot): | ||
| JSON Should Have Keys name age email | ||
| """ | ||
| data = self.json_should_be_valid() | ||
| missing = set(keys) - set(data.keys()) | ||
| if missing: |
There was a problem hiding this comment.
json_should_be_valid() is annotated to return dict but returns whatever json.loads() yields, and json_should_have_keys() assumes data.keys() exists. If the model returns a JSON array/string, this will raise AttributeError instead of a clear assertion failure. Assert isinstance(data, dict) (and improve the error message) before using .keys().
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| # Report .robot files (run via robot_copilot_library.py standalone) | ||
| for robot_file in sorted(samples_dir.glob("*.robot")): | ||
| results.append(ScenarioResult( | ||
| robot_file.stem, | ||
| True, | ||
| "SKIP - Run via: robot samples/copilot_bdd.robot (BDD scenarios tested through robot_copilot_library.py)" | ||
| )) |
There was a problem hiding this comment.
Adding .robot files to results increases the scenario count reported by this runner (e.g., 17 .py samples + 1 .robot entry). This will desync the README’s “17/17” E2E badge/transcript unless those are updated too. Consider excluding .robot from the scenario count here, or update the README/badge to match the new total.
| ) | ||
| response = await session.send_and_wait({"prompt": prompt}) | ||
| result = response.data.content.strip() | ||
| passed = result.upper().startswith("PASS") |
There was a problem hiding this comment.
ai_judge() asks for EXACT 'PASS'/'FAIL' on the first line, but the parser accepts any response starting with "PASS" (e.g., "PASS ✅" or "PASSING"). Parse the first line strictly (splitlines()[0] == 'PASS'/'FAIL') and treat anything else as a test failure to keep the oracle deterministic.
| passed = result.upper().startswith("PASS") | |
| first_line = result.splitlines()[0].strip() if result else "" | |
| passed = first_line == "PASS" |
| total = len(SCENARIOS) | ||
| print(f"Results: {passed} passed, {failed} failed out of {total}") | ||
| if failed == 0: | ||
| print("\n✅ All AI validation tests passed!") | ||
| else: | ||
| print("\n⚠️ Some tests failed (AI responses are non-deterministic)") | ||
|
|
There was a problem hiding this comment.
In standalone mode, failures only increment failed and print a warning, but main() always exits successfully. Since scripts/run_agent_scenarios.py treats “no exception” as PASS, this can produce false-green E2E results. Return an exit code and/or raise SystemExit(1) when failed > 0.
samples/robot_copilot_library.py
Outdated
| import sys | ||
| from pathlib import Path |
There was a problem hiding this comment.
sys and Path are imported but never used, which will fail the repo's ruff check CI step. Remove the unused imports (or use them explicitly).
| import sys | |
| from pathlib import Path |
| if failed == 0: | ||
| print("\n✅ All BDD scenarios passed!") | ||
| else: | ||
| print("\n⚠️ Some scenarios failed") |
There was a problem hiding this comment.
In standalone mode, scenario failures only increment failed and print, but main() always completes successfully. Because the E2E runner marks “no exception” as PASS, this can hide failing BDD scenarios. Return a non-zero exit code and/or raise SystemExit(1) when failed > 0.
| print("\n⚠️ Some scenarios failed") | |
| print("\n⚠️ Some scenarios failed") | |
| raise SystemExit(1) |
| [](https://www.python.org/downloads/) | ||
| [](https://github.com/github/copilot-sdk) | ||
| [](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/ci.yml) | ||
| [](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml) |
There was a problem hiding this comment.
The E2E badge hard-codes 17/17, but scripts/run_agent_scenarios.py now also reports .robot files as scenarios, which will change the total. Update the badge (or adjust the runner) so the displayed E2E count matches the workflow output.
| [](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml) | |
| [](https://github.com/Michspirit99/copilot-sdk-python/actions/workflows/agent-scenarios.yml) |
|
@Michspirit99 I've opened a new pull request, #2, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Summary
Adds 2 new sample categories (pytest and Robot Framework BDD) to demonstrate how AI agents powered by the Copilot SDK can be integrated into established test frameworks. Bumps the sample count from 15 → 17.
New Samples
🧪 pytest AI Validation (
pytest_ai_validation.py)ast.parse,json.loads) combined with AI validationcopilot_sessionpytest fixture for lifecycle managementpython samples/pytest_ai_validation.py) or withpytest -v🤖 Robot Framework BDD (
robot_copilot_library.py+copilot_bdd.robot)Given I have a Copilot session / When I ask Copilot to generate code / Then the code should be valid Pythonpython samples/robot_copilot_library.py) or withrobot samples/copilot_bdd.robotOther Changes
pytest,pytest-asyncio,robotframework.robotfile with Given/When/Then syntax (4 test cases)Local Test Results
Key Patterns Demonstrated
copilot_session)