Executing agent with multiple projects in parallel #9
+59
−26
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR aims to improve the way a validator executes one agent code mainly to increase speed and flexibility. Speed is very useful for local development and debugging while the flexibility should help on the live validator side. The PR includes 3 parts:
Proxy docker container now runs 128 workers. This is to support parallel agent runs and eventually agents that might do multiple LLM calls concurrently. The idea is not to have the proxy service be a bottleneck for agent executions
Docker image for agent sandbox is now built locally before execution. This should help with:
** Reducing the workload and dependency on prebuilt images for each project on different platforms and operating systems
** Docker images should be more robust now and validator should be able to work on any system
** You can now quickly add new projects into the validation pipeline just by adding the project name into the projects list and restarting the validator (no image builds required)
** Quickly adding more libraries for agents to use without having to centrally rebuild all the docker images.
** There is very little overhead on the validator side because the docker build is only done the first time as docker then cashes the images.
The validator now runs multiple sandbox containers one for each project. The agent runs and solves projects in parallel instead of in sequence. This saves a lot of time with just a very small increase in cost (the sandbox containers use very little resources). I have found this very useful for development and testing and it might be useful to speed up evaluations on the live system as well.