-
Notifications
You must be signed in to change notification settings - Fork 77
feat(tools): Add GPT-5.1 ApplyPatch tool #1166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tch format\n\n- Port reference parser/applicator into core module\n- Integrate as ToolDefinition with executor and safe workspace I/O\n- New example using openai/gpt-5.1-codex-mini to create/modify/delete FACTS.txt via tool\n\nCo-authored-by: openhands <openhands@all-hands.dev>
…1 example; keep reasoning_summary=None to avoid org verification Co-authored-by: openhands <openhands@all-hands.dev>
…example: run gpt-5.1-codex-mini end-to-end
- Override to_responses_tool() to return minimal {type:function, name}
- Add validation_alias/serialization_alias so server-sent 'patch' is accepted
- Update example to register ApplyPatch and run create/modify/delete FACTS.txt
Co-authored-by: openhands <openhands@all-hands.dev>
…ix example tool registration - Add AliasChoices to accept server arg name 'patch' - Provide minimal parameters in to_responses_tool to steer Responses tools - Example: remove register_default_tools; import classes to register Co-authored-by: openhands <openhands@all-hands.dev>
…input; send only tool outputs - Remove assistant function_call emission in Message.to_responses_dict - Stop passing back previous turn 'reasoning' items to avoid ordering errors - This aligns with OpenAI Responses pattern: client supplies only function_call_output Co-authored-by: openhands <openhands@all-hands.dev>
…assthrough - Keep assistant function_call in input so paired tool outputs validate - Do not echo prior 'reasoning' items to avoid ordering constraints Co-authored-by: openhands <openhands@all-hands.dev>
- Capture design choices, telemetry observations, and pending items - Reference PR #1166 and branch name for continuity Co-authored-by: openhands <openhands@all-hands.dev>
…tant function_call in input - Reinstate reasoning passthrough to satisfy existing unit tests - Keep assistant function_call items and pair with tool outputs in same request Co-authored-by: openhands <openhands@all-hands.dev>
Coverage Report •
|
|||||||||||||||||||||||||
…_output ids - Fix imports and formatting for pre-commit Co-authored-by: openhands <openhands@all-hands.dev>
…esponses notes - FileEditor example mirrors ApplyPatch for log comparison - Notes: restore reasoning passthrough; pair assistant calls with tool outputs Co-authored-by: openhands <openhands@all-hands.dev>
…t; fix pairing error. Telemetry: trim system instructions from logs and compact tool metadata for readability. Co-authored-by: openhands <openhands@all-hands.dev>
9a9234e to
e910dbf
Compare
… rejection.\n\nAligned expected newline behavior with upstream apply_patch (no enforced trailing newline).\n\nCo-authored-by: openhands <openhands@all-hands.dev>
…le and status update. Co-authored-by: openhands <openhands@all-hands.dev>
17d2380 to
e772cbb
Compare
Co-authored-by: openhands <openhands@all-hands.dev>
…ross action and executor. - Update tests to pass 'patch' field - Keep Responses tool schema minimal with required 'patch' Co-authored-by: openhands <openhands@all-hands.dev>
… mention of 'patch_text'. Co-authored-by: openhands <openhands@all-hands.dev>
…y diff). Co-authored-by: openhands <openhands@all-hands.dev>
…via Tool(name="apply_patch"). Co-authored-by: openhands <openhands@all-hands.dev>
…d clarify Responses schema. Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
Co-authored-by: openhands <openhands@all-hands.dev>
|
I’m OpenHands-GPT-5.1, running this change end-to-end against the examples. I validated that the updated preset ( cd software-agent-sdk
LLM_API_KEY="$LITELLM_API_KEY" LLM_BASE_URL="https://llm-proxy.eval.all-hands.dev" LLM_MODEL="openai/gpt-4.1-mini" uv run python examples/01_standalone_sdk/20_stuck_detector.pyResult:
This confirms the preset wiring is intact and non‑GPT‑5 models still default to |
| - Create multiple files (FACTS.txt, NOTES.md) | ||
| - Apply multi-hunk edits to a single file | ||
| - Apply a single patch that touches multiple files | ||
| - Mix add / update / delete operations in one patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this script to remain as example?
IMO, if the tool is not default for GPT-5 yet, it makes sense and it would be nice for people to have a way to see how they can use it. If it is default (and according to your comment before, it should be, that's done), then maybe it's not very useful for client code.
Still useful for testing: it actually tests functionalities like multiple files. But maybe testing is another story: we could take in GPT-5 optimizations, and then make sure that the integration tests use them. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i don't think we absolutely need this example 🤔
But irrc you already wrote the docs, maybe we can keep it and add this example as a page under LLM features of https://docs.openhands.dev/sdk/guides/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since now it’s not a default tool, I’d tend to keep it 🤔 It’s easy for the agent, even, to directly execute and see the tool in action
More importantly, for client developers / contributors to get to test a GPT-5 specific tool, maybe it gives people ideas for experimenting with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a strong opinion though, I just would love to continue assembling prompts too and simply make them available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed it, and updated the docs example to be a simple script:
… example in this PR - Revert model-aware default tools wiring from preset/default.py - Remove model-specific default tools logic from model_features.py - Drop preset default tests tied to GPT-5 mapping Follow-up PR will carry model-aware defaults + wiring. Co-authored-by: openhands <openhands@all-hands.dev>
|
|
||
|
|
||
| def test_function_call_and_output_paired(): | ||
| # Assistant emits a function_call; tool returns an output for same id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test exists because the agent had some trouble when it was testing apply_patch initially. I think it’s a reasonable test, maybe a little too obvious (practically nothing with GPT-5/Responses would work if it didn’t), but I’d keep it
|
@xingyaoww Done, it looks to me like the commit is exactly right, it undid the wiring. The rest has been tested before so unless there’s some lost comma messing with us, it seems good to go. 😇 |
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
xingyaoww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
| - Create multiple files (FACTS.txt, NOTES.md) | ||
| - Apply multi-hunk edits to a single file | ||
| - Apply a single patch that touches multiple files | ||
| - Mix add / update / delete operations in one patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i don't think we absolutely need this example 🤔
But irrc you already wrote the docs, maybe we can keep it and add this example as a page under LLM features of https://docs.openhands.dev/sdk/guides/
Summary
Design notes
Usage
uv run python examples/01_standalone_sdk/28_apply_patch_with_gpt5_1.py
Responses API/tool-calling
Testing performed
Next steps
Co-authored-by: openhands openhands@all-hands.dev
@enyst can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:56abec9-pythonRun
All tags pushed for this build
About Multi-Architecture Support
56abec9-python) is a multi-arch manifest supporting both amd64 and arm6456abec9-python-amd64) are also available if needed