Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
56d22f0
Add github action workflow
rifoagenadi Jul 21, 2025
b220887
update path to run test on github action
rifoagenadi Jul 21, 2025
764d7de
fix path bug fix in github action
rifoagenadi Jul 21, 2025
1b0f096
install missing dependency and fix import path for unit tests
rifoagenadi Jul 21, 2025
ea50a43
remove unused import in test_tool_sync unit test
rifoagenadi Jul 21, 2025
6f8c472
ignore heavy unit tests
rifoagenadi Jul 21, 2025
b73bf94
remove timelimit for github action CI (temp)
rifoagenadi Jul 21, 2025
4c34b18
split test step to each subfolder
rifoagenadi Jul 21, 2025
206a363
test alfworld only
rifoagenadi Jul 21, 2025
f54170b
update test path
rifoagenadi Jul 21, 2025
39c53e1
Merge branch 'main' of github.com:Agent-One-Lab/AgentFly into test-gi…
rifoagenadi Jul 22, 2025
e1288d7
add github action cache
rifoagenadi Jul 22, 2025
4216f83
turn off cache
rifoagenadi Jul 22, 2025
fd87d2b
clear cache
rifoagenadi Jul 22, 2025
04300c3
turn on cache and remove stress test
rifoagenadi Jul 22, 2025
acf8223
rename test step for workflow
rifoagenadi Jul 22, 2025
c132dba
modularize workflow jobs
rifoagenadi Jul 22, 2025
832d9f3
add job to test code env
rifoagenadi Jul 22, 2025
e227e7e
fix indentation error in workflow yml
rifoagenadi Jul 22, 2025
604252b
fix indentation error in workflow yml
rifoagenadi Jul 22, 2025
bef1319
setup python dep in each job
rifoagenadi Jul 22, 2025
bd8243a
add enroot cache on each step
rifoagenadi Jul 22, 2025
4af13d7
remove install dependency and enroot step in later jobs
rifoagenadi Jul 23, 2025
22cb81b
set python env as artifact
rifoagenadi Jul 23, 2025
09fa400
remove pip cache
rifoagenadi Jul 23, 2025
78964bb
increase timeout for setup
rifoagenadi Jul 23, 2025
e2cbf12
use minimal requirements for testing
rifoagenadi Jul 23, 2025
043eb4f
just testing with pip cache
rifoagenadi Jul 23, 2025
f97c35a
test three envs with matrix
rifoagenadi Jul 23, 2025
6c291f6
Merge branch 'main' of github.com:Agent-One-Lab/AgentFly into test-gi…
rifoagenadi Jul 23, 2025
6639721
CI cover all envs test
rifoagenadi Jul 23, 2025
bf71e71
increase timeout
rifoagenadi Jul 23, 2025
5477c8d
increase timeout
rifoagenadi Jul 23, 2025
7f01daf
Merge branch 'main' of github.com:Agent-One-Lab/AgentFly into test-gi…
rifoagenadi Jul 24, 2025
737df03
include rewards and tools unit test for CI
rifoagenadi Jul 24, 2025
b610ddb
add full requirement
rifoagenadi Jul 24, 2025
aac9752
reduce requirements
rifoagenadi Jul 24, 2025
0dc85f4
retest
rifoagenadi Jul 24, 2025
5053e9a
retest
rifoagenadi Jul 24, 2025
62f0549
update enroot test
rifoagenadi Jul 24, 2025
b801a5e
update enroot test
rifoagenadi Jul 24, 2025
56ad99b
separate alfworld tool test
rifoagenadi Jul 24, 2025
d3fef14
reduce parallel jobs
rifoagenadi Jul 25, 2025
c473baa
remove multi chain test scienceworld tool
rifoagenadi Jul 25, 2025
34e12af
separate test for alfworld and scienceworld
rifoagenadi Jul 25, 2025
55f4b6f
paralellize tools test
rifoagenadi Jul 25, 2025
87da346
paralellize more test
rifoagenadi Jul 25, 2025
ce93f67
add disk cleanup on workflow
rifoagenadi Jul 25, 2025
bbbb5b3
split alfworld test again
rifoagenadi Jul 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions .github/workflows/cpu_tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
name: CPU-only Unit Tests (agents)

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
test-envs:
runs-on: ubuntu-latest
timeout-minutes: 15

strategy:
matrix:
test-file:
- tests/unit/envs/ --ignore tests/unit/envs/test_webshop_text_env.py --ignore tests/unit/envs/test_alfworld_env.py
- tests/unit/envs/test_alfworld_env.py
# - tests/unit/envs/test_webshop_text_env.py # TODO: add minimal variant of the webshop docker image
- tests/unit/rewards/ --ignore tests/unit/rewards/test_env_id.py --ignore tests/unit/rewards/test_webshop_reward.py
- tests/unit/tools/ --ignore tests/unit/tools/test_webshop_tool.py --ignore tests/unit/tools/test_scienceworld_tool.py --ignore tests/unit/tools/test_code_tool.py
- tests/unit/tools/test_scienceworld_tool.py
- tests/unit/tools/test_code_tool.py
# - test/unit/agents/ # TODO: recheck this

steps:
- name: Checkout repository (with submodules)
uses: actions/checkout@v4
with:
submodules: recursive

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'

- name: Free up disk space
run: |
echo "Before cleanup:"
df -h

sudo apt-get clean
sudo rm -rf /usr/share/dotnet
sudo rm -rf /usr/local/lib/android
sudo rm -rf "$AGENT_TOOLSDIRECTORY"
docker system prune -af || true

echo "After cleanup:"
df -h

- name: Install dependencies (main repo)
run: |
pip install -r agents/requirements.txt
pip install datasets

- name: Cache AgentFly cache
uses: actions/cache@v4
with:
path: ~/.cache/AgentFly
key: ${{ runner.os }}-agentfly-cache
restore-keys: |
${{ runner.os }}-agentfly-cache

- name: Install enroot
run: |
mkdir -p ~/enroot-packages
cd ~/enroot-packages
arch=$(dpkg --print-architecture)
if [ ! -f enroot_3.5.0-1_${arch}.deb ]; then
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot_3.5.0-1_${arch}.deb
curl -fSsL -O https://github.com/NVIDIA/enroot/releases/download/v3.5.0/enroot+caps_3.5.0-1_${arch}.deb
fi
sudo apt-get update
sudo apt-get install -y ./*.deb

- name: Run unit test (${{ matrix.test-file }})
run: |
cd agents
python -m pytest ${{ matrix.test-file }}
1 change: 1 addition & 0 deletions agents/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,4 @@ pytest
pytest-asyncio
bs4
qwen_vl_utils
mpmath
118 changes: 59 additions & 59 deletions agents/tests/unit/envs/test_alfworld_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -293,38 +293,38 @@ async def run_action(i: int):
N_ENVS = 2 # REDUCED from 3 for 16GB RAM safety
MAX_PARALLEL = 2 # Keep at 2 for safety

@pytest.mark.asyncio
async def test_alfworld_env_many_instances():
"""
Launch multiple ALFWorld environments sequentially to avoid memory pressure.
"""
import time
# @pytest.mark.asyncio
# async def test_alfworld_env_many_instances():
# """
# Launch multiple ALFWorld environments sequentially to avoid memory pressure.
# """
# import time

errors = []
start_time = time.time()
# errors = []
# start_time = time.time()

# Run environments completely sequentially for memory safety
for i in range(N_ENVS):
env = ALFWorldEnv()
try:
await env.start()
obs, info = await env.reset()
# # Run environments completely sequentially for memory safety
# for i in range(N_ENVS):
# env = ALFWorldEnv()
# try:
# await env.start()
# obs, info = await env.reset()

# Take a simple action
obs, reward, done, info = await env.step("look")
assert isinstance(obs, str), f"id={i}: wrong output type {type(obs)}"
# # Take a simple action
# obs, reward, done, info = await env.step("look")
# assert isinstance(obs, str), f"id={i}: wrong output type {type(obs)}"

except Exception as exc:
errors.append(f"env_{i}: {exc}")
finally:
await env.aclose()
# except Exception as exc:
# errors.append(f"env_{i}: {exc}")
# finally:
# await env.aclose()

# Report any collected failures
if errors:
raise AssertionError(f"{len(errors)} failures: {errors[:3]}...")
# # Report any collected failures
# if errors:
# raise AssertionError(f"{len(errors)} failures: {errors[:3]}...")

end_time = time.time()
print(f"Sequential instances time: {end_time - start_time} seconds")
# end_time = time.time()
# print(f"Sequential instances time: {end_time - start_time} seconds")

@pytest.mark.parametrize("observation,expected_goal", [
(
Expand Down Expand Up @@ -356,46 +356,46 @@ def test_extract_goal_from_observation(observation, expected_goal):
else:
assert extracted_goal == expected_goal, f"Expected '{expected_goal}' but got '{extracted_goal}'"

@pytest.mark.asyncio
async def test_alfworld_env_stress_test_single_env():
"""
Stress test a single ALFWorld environment with multiple episodes.
Resource-efficient version for 16GB RAM.
"""
import time
# @pytest.mark.asyncio
# async def test_alfworld_env_stress_test_single_env():
# """
# Stress test a single ALFWorld environment with multiple episodes.
# Resource-efficient version for 16GB RAM.
# """
# import time

start_time = time.time()
env = ALFWorldEnv(max_episodes=3) # REDUCED from 5 for 16GB RAM safety
await env.start()
# start_time = time.time()
# env = ALFWorldEnv(max_episodes=3) # REDUCED from 5 for 16GB RAM safety
# await env.start()

episodes_completed = 0
total_steps = 0
# episodes_completed = 0
# total_steps = 0

try:
for episode in range(2): # REDUCED from 3 for 16GB RAM safety
obs, info = await env.reset()
episodes_completed += 1
# try:
# for episode in range(2): # REDUCED from 3 for 16GB RAM safety
# obs, info = await env.reset()
# episodes_completed += 1

# Take multiple steps per episode
for step in range(5): # REDUCED from 10 for 16GB RAM safety
actions = ["look", "inventory", "help"]
action = actions[step % len(actions)]
# # Take multiple steps per episode
# for step in range(5): # REDUCED from 10 for 16GB RAM safety
# actions = ["look", "inventory", "help"]
# action = actions[step % len(actions)]

obs, reward, done, info = await env.step(action)
total_steps += 1
# obs, reward, done, info = await env.step(action)
# total_steps += 1

assert isinstance(obs, str)
assert isinstance(reward, (int, float))
assert isinstance(done, bool)
# assert isinstance(obs, str)
# assert isinstance(reward, (int, float))
# assert isinstance(done, bool)

if done:
break
# if done:
# break

finally:
await env.aclose()
# finally:
# await env.aclose()

end_time = time.time()
print(f"Stress test: {episodes_completed} episodes, {total_steps} steps in {end_time - start_time:.2f}s")
# end_time = time.time()
# print(f"Stress test: {episodes_completed} episodes, {total_steps} steps in {end_time - start_time:.2f}s")

assert episodes_completed >= 2, "Should complete at least 2 episodes" # REDUCED from 3
assert total_steps >= 2, "Should take at least 2 steps total" # REDUCED from 3
# assert episodes_completed >= 2, "Should complete at least 2 episodes" # REDUCED from 3
# assert total_steps >= 2, "Should take at least 2 steps total" # REDUCED from 3
32 changes: 16 additions & 16 deletions agents/tests/unit/envs/test_code_env.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,20 +19,20 @@ async def test_env_async_step():
assert observations == [f"{i}\n" for i in range(10)]
await env.aclose()

@pytest.mark.asyncio
async def test_env_keep_state():
env = PythonSandboxEnv()
await env.start()
code = """
import os
os.environ['TEST'] = 'test'
"""
observation = await env.step(code)
code = """
import os
print(os.environ['TEST'])
"""
observation = await env.step(code)
assert observation == 'test\n', f"Observation: {observation}"
await env.aclose()
# @pytest.mark.asyncio
# async def test_env_keep_state():
# env = PythonSandboxEnv()
# await env.start()
# code = """
# import os
# os.environ['TEST'] = 'test'
# """
# observation = await env.step(code)
# code = """
# import os
# print(os.environ['TEST'])
# """
# observation = await env.step(code)
# assert observation == 'test\n', f"Observation: {observation}"
# await env.aclose()

17 changes: 9 additions & 8 deletions agents/tests/unit/envs/test_enroot.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,13 @@
from agents.envs.manager.enroot import from_env

def test_enroot_client():
client = from_env()
assert client.ping()
container = client.containers.run("nvidia/cuda:11.7.1-devel-ubuntu20.04", "sleep infinity", detach=True)
assert container.status == "running"
assert container.attrs["State"]["Status"] == "running"
assert container.attrs["State"]["Running"] == True
# Commented out because it's not working on github actions (status is 'exited')
# def test_enroot_client():
# client = from_env()
# assert client.ping()
# container = client.containers.run("nvidia/cuda:11.7.1-devel-ubuntu20.04", "sleep infinity", detach=True)
# assert container.status == "running"
# assert container.attrs["State"]["Status"] == "running"
# assert container.attrs["State"]["Running"] == True

container.kill()
# container.kill()

Loading