Skip to content

Commit 9aabf7f

Browse files
Add SDK compatibility documentation and benchmarks-commit parameter (#119)
* Add SDK compatibility documentation and benchmarks-commit parameter - Document SDK critic module breaking change (commit 79868ae5) in README - Add optional benchmarks-commit parameter to build-swe-bench-images workflow - Update checkout step to support evaluating older SDK versions with compatible benchmarks code - Maintain backward compatibility - workflow behaves the same when parameter is not provided Fixes #118 Co-authored-by: openhands <openhands@all-hands.dev> * Refactor documentation to emphasize general SDK version compatibility - Make the documentation more general about benchmarks/SDK version dependencies - Present SDK critic module as an example rather than the main focus - Clarify that version incompatibilities can arise as both codebases evolve Co-authored-by: openhands <openhands@all-hands.dev> * Add clarifying comments about empty ref behavior in checkout step - Explain that empty ref causes actions/checkout to use the triggering commit - This preserves the original workflow behavior for workflow_dispatch events Co-authored-by: openhands <openhands@all-hands.dev> --------- Co-authored-by: openhands <openhands@all-hands.dev>
1 parent 58ef980 commit 9aabf7f

File tree

2 files changed

+63
-1
lines changed

2 files changed

+63
-1
lines changed

.github/workflows/build-swe-bench-images.yml

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,11 @@ on:
3030
required: false
3131
default: ''
3232
type: string
33+
benchmarks-commit:
34+
description: 'Benchmarks repository commit/ref to use. Leave blank to use the PR head or main branch. Useful for evaluating older SDK versions that are incompatible with current benchmarks code (e.g., SDK versions before the critic module was added in commit 79868ae5).'
35+
required: false
36+
default: ''
37+
type: string
3338

3439
# Reasonable defaults for automatic (push) runs; workflow_dispatch can override these.
3540
env:
@@ -61,9 +66,25 @@ jobs:
6166
issues: write
6267

6368
steps:
69+
- name: Determine checkout ref
70+
id: checkout-ref
71+
run: |
72+
if [ "${{ github.event_name }}" = "workflow_dispatch" ] && [ -n "${{ inputs.benchmarks-commit }}" ]; then
73+
echo "ref=${{ inputs.benchmarks-commit }}" >> "$GITHUB_OUTPUT"
74+
echo "Using benchmarks-commit from workflow_dispatch: ${{ inputs.benchmarks-commit }}"
75+
elif [ -n "${{ github.event.pull_request.head.sha }}" ]; then
76+
echo "ref=${{ github.event.pull_request.head.sha }}" >> "$GITHUB_OUTPUT"
77+
echo "Using PR head SHA: ${{ github.event.pull_request.head.sha }}"
78+
else
79+
# Empty ref means checkout the ref that triggered the workflow (e.g., main branch for workflow_dispatch)
80+
echo "ref=" >> "$GITHUB_OUTPUT"
81+
echo "Using default ref (the commit that triggered this workflow)"
82+
fi
83+
6484
- uses: actions/checkout@v4
6585
with:
66-
ref: ${{ github.event.pull_request.head.sha }}
86+
# When ref is empty, actions/checkout uses the commit that triggered the workflow
87+
ref: ${{ steps.checkout-ref.outputs.ref }}
6788
submodules: recursive
6889

6990
# If this was a manual dispatch, override defaults with provided inputs.

README.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,47 @@ Uses a [remote runtime API](https://openhands.dev/blog/evaluation-of-llms-as-cod
166166

167167
See individual benchmark READMEs for specific usage examples.
168168

169+
## SDK Compatibility and Version Management
170+
171+
⚠️ **Important**: The benchmarks repository depends on the [OpenHands Agent SDK](https://github.com/OpenHands/software-agent-sdk), and **not every version of the benchmarks is compatible with every version of the SDK**. As the SDK evolves and introduces new features, the benchmarks code may adopt these features, creating version dependencies.
172+
173+
### Evaluating Different SDK Versions
174+
175+
When evaluating a specific SDK version, you need to ensure the benchmarks code is compatible with that SDK version. You have two options:
176+
177+
1. **Use the `benchmarks-commit` parameter in the workflow** (Recommended):
178+
- When manually triggering the `build-swe-bench-images` workflow, specify both:
179+
- `sdk-commit`: The SDK version you want to evaluate
180+
- `benchmarks-commit`: A benchmarks commit that's compatible with that SDK version
181+
182+
2. **Manually check out compatible versions locally**:
183+
```bash
184+
# Check out a benchmarks commit that's compatible with your target SDK version
185+
git checkout <benchmarks-commit>
186+
187+
# Update the SDK submodule to your target version
188+
cd vendor/software-agent-sdk
189+
git checkout <sdk-commit>
190+
cd ../..
191+
192+
# Rebuild the environment
193+
make build
194+
```
195+
196+
### Example: SDK Critic Module
197+
198+
A notable example of version dependency is the SDK critic module. As of SDK commit [`79868ae5`](https://github.com/OpenHands/software-agent-sdk/commit/79868ae5) (November 17, 2025), the OpenHands Agent SDK introduced the `openhands.sdk.critic` module. Current benchmarks code imports `CriticBase` from this module, which means:
199+
200+
- **SDK versions ≥ `79868ae5`**: Compatible with current benchmarks code
201+
- **SDK versions < `79868ae5`**: Require an older benchmarks commit (before the critic import was added)
202+
203+
To check if a specific benchmarks commit requires the critic module:
204+
```bash
205+
git show <commit>:benchmarks/utils/models.py | grep "from openhands.sdk.critic"
206+
```
207+
208+
If this command returns output, that benchmarks commit requires an SDK version with the critic module.
209+
169210
## Links
170211

171212
- **Original OpenHands**: https://github.com/OpenHands/OpenHands/

0 commit comments

Comments
 (0)