This repository contains the code for our survey and benchmark of high-dimensional Bayesian optimization of discrete sequences using poli and poli-baselines.
Check our leaderboards in our project website.
We expect contributions to this benchmark to be implemented as solvers in poli-baselines. Follow the documentation therein.
In a few words, we expect you to provide the following folder structure:
# In poli-baselines' solvers folder
solvers
├── your_solver_name
│ ├── __init__.py
│ ├── environment.your_solver_name.yml
│ └── your_solver_name.pyWe expect environment.your_solver_name.yml to create a conda environment in which your_solver_name.py could be imported. See a template here:
name: poli__your_solver_name
channels:
- defaults
dependencies:
- python=3.10
- pip
- pip:
- your
- dependencies
- here
- "git+https://github.com/MachineLearningLifeScience/poli.git@dev"
- "git+https://github.com/MachineLearningLifeScience/poli-baselines.git@main"Provide said code as as a pull request to poli-baselines. Afterwards, we will register it, run it, and add its reports to our ongoing benchmarks.
If you feel eager to test it in our problems, you could prepare for local testing here. We provide a requirements.txt/environment.yml you can use to create an environment for running the benchmarks. Afterwards, install this package:
conda create -n hdbo_benchmark python=3.10
conda activate hdbo_benchmark
pip install -r requirements.txt
pip install -e .Change the WANDB_PROJECT and WANDB_ENTITY in src/hdbo_benchmark/utils/constants.py.
After implementing a solver in poli-baselines, you can register it in src/hdbo_benchmark/utils/experiments/load_solvers.py.
The scripts used to run the benchmarks can be found in src/hdbo_benchmark/experiments. To run e.g. albuterol_similarity of the PMO benchmark you can run:
conda run -n hdbo_benchmark python src/hdbo_benchmark/experiments/benchmark_on_pmo/run.py \
--function-name=albuterol_similarity \
--solver-name=line_bo \
--latent-dim=128 \
--max-iter=300 \assuming hdbo_benchmark is an environment in which you can run your solver, and in which this package is installed. Examples of environments where solvers have been tested to run can be found in poli-baselines.
We use torchdrug to download the dataset. It has very picky dependencies, but you should be able to install it by running
conda env create --file environment.data_preprocessing.ymland following the scripts in src/hdbo_benchmark/data_preprocessing/zinc250k inside that env (conda activate hdbo__data_preprocessing).
Depending on the black box you use within poli, we expect you to cite a set of references. Check the documentation of the black box for a list (including bibtex).