This repository provides the implementation accompanying the paper "MULTI-LINGUAL ML BENCHMARK FOR AUTOML".
It includes the code for dataset construction, the evaluation framework, and the agents assessed within this benchmark.
We use uv for environment management.
Install uv once, then run uv sync (or uv pip install -r requirements.txt) inside the project to create the virtual environment.
- Install dependencies:
uv sync- Activate the virtual environment:
source .venv/bin/activate- Build the agent runtime:
python ml2b.py build-runtime -i aide --agent-dir agents/aide
# For ARM platforms use:
python ml2b.py build-runtime -i react --agent-dir agents/react --platform "linux/arm64"(If you use another agent, maintain the same file structure and command pattern.)
For proxy settings, see python ml2b.py build-runtime --help for details.
- Download and prepare the dataset:
python ml2b.py prepare-data(The dataset can also be downloaded manually from Hugging Face Hub by placing the data and tasks directories into competitions.)
For customization details, see dataset implementation documentation.
- Unpack the competition files:
cd competitions/data
chmod +x ./unpack_files.sh
./unpack_files.shThis will extract .tar.gz archives with competition files
After completing the preparation steps, you should see the following folder structure:
.
├── ml2b.py
└── competitions/
├── data/
├── competitions.json
└── tasks/
-
Configure agent parameters in the corresponding directory (e.g.,
agents/aide/config.yaml).Ensure necessary environment variables such as
OPENAI_API_KEYare exported in your shell. -
Configure Docker runtime limitations in runtime_config.json.
Optional: You may change proxy settings for the validation container in squid.conf.
-
Run the benchmark (see
python ml2b.py bench --helpfor more options):
python ml2b.py bench -i aide -w 3 --agent-dir agents/aide --seed 42 --args-variant extended --code-variant extendedGeneral documentation can be found in docs