ML2B

This repository provides the implementation accompanying the paper "MULTI-LINGUAL ML BENCHMARK FOR AUTOML".
It includes the code for dataset construction, the evaluation framework, and the agents assessed within this benchmark.

Usage

Requirements

We use uv for environment management.
Install uv once, then run uv sync (or uv pip install -r requirements.txt) inside the project to create the virtual environment.

Prepare Environment

Install dependencies:

   uv sync

Activate the virtual environment:

   source .venv/bin/activate

Build the agent runtime:

   python ml2b.py build-runtime -i aide --agent-dir agents/aide

   # For ARM platforms use:
   python ml2b.py build-runtime -i react --agent-dir agents/react --platform "linux/arm64"

(If you use another agent, maintain the same file structure and command pattern.)

For proxy settings, see python ml2b.py build-runtime --help for details.

Download and prepare the dataset:

   python ml2b.py prepare-data

(The dataset can also be downloaded manually from Hugging Face Hub by placing the data and tasks directories into competitions.)

For customization details, see dataset implementation documentation.

Unpack the competition files:

   cd competitions/data
   chmod +x ./unpack_files.sh
   ./unpack_files.sh

This will extract .tar.gz archives with competition files

After completing the preparation steps, you should see the following folder structure:

.
├── ml2b.py
└── competitions/
    ├── data/
    ├── competitions.json
    └── tasks/

Running the Benchmark

Configure agent parameters in the corresponding directory (e.g., agents/aide/config.yaml).

Ensure necessary environment variables such as OPENAI_API_KEY are exported in your shell.
Configure Docker runtime limitations in runtime_config.json.

Optional: You may change proxy settings for the validation container in squid.conf.
Run the benchmark (see python ml2b.py bench --help for more options):

   python ml2b.py bench -i aide -w 3 --agent-dir agents/aide --seed 42 --args-variant extended --code-variant extended

Documentation

General documentation can be found in docs

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
agents		agents
baselines		baselines
competitions		competitions
docs		docs
environments		environments
leakage		leakage
loaders		loaders
metrics		metrics
python		python
src		src
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
docker-compose.override.yml		docker-compose.override.yml
docker-compose.yml		docker-compose.yml
grade.sh		grade.sh
grade_manual.py		grade_manual.py
ml2b.py		ml2b.py
pyproject.toml		pyproject.toml
test_submission.sh		test_submission.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ML2B

Usage

Requirements

Prepare Environment

Running the Benchmark

Documentation

About

Uh oh!

Releases

Packages

Contributors 8

Uh oh!

Languages

License

enaix/ml2b

Folders and files

Latest commit

History

Repository files navigation

ML2B

Usage

Requirements

Prepare Environment

Running the Benchmark

Documentation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Uh oh!

Languages

Packages