Official implementation of LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers.
LLM-FE is a novel framework that leverages Large Language Models (LLMs) as evolutionary optimizers to automate feature engineering for tabular datasets. LLM-FE iteratively generates and refines features using structured prompts, selecting high-impact transformations based on model performance. This approach enables the discovery of interpretable and high-quality features, enhancing the performance of various machine learning models across diverse classification and regression tasks.
To run the code, create a conda environment and install the dependencies using requirements.txt:
conda create -n llmfe python=3.11.7
conda activate llmfe
pip install -r requirements.txt
In run_llmfe.sh file, set the OPENAI API key under
export API KEY = <ENTER YOUR API KEY>
To run the LLM-FE pipeline on a sample dataset:
bash run_llmfe.sh
@article{abhyankar2025llm,
title={LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers},
author={Abhyankar, Nikhil and Shojaee, Parshin and Reddy, Chandan K},
journal={arXiv preprint arXiv:2503.14434},
year={2025}
}
This repository is licensed under MIT licence.
This work is built on top of other open source projects like FunSearch and LLM-SR. We thank the original contributors of these works for open-sourcing their valuable source codes.
For any questions or issues, you are welcome to open an issue in this repo, or contact us at nikhilsa@vt.edu and parshinshojaee@vt.edu.
