Multilingual, Highly-granular, Template-free, Education-based Probing Dataset MALAMUTE is a benchmark designed to evaluate language models on factual knowledge across diverse languages and educational levels. It avoids templates and emphasizes nuanced, real-world understanding.
- Python 3.11
- Conda (for environment management)
git clone https://github.com/Shaier/MALAMUTE.git
cd MALAMUTEconda create -n malamute python=3.11
conda activate malamutepip install -r requirements.txtUnzip the dataset and remove any extraneous files:
rm -rf data && unzip -o data.zip -d data && rm data.zipTo evaluate using MLMs (e.g., BERT-style models):
python test_MLM.pyTo evaluate using CLMs (e.g., GPT-style models):
See notebooks repoIf you use this code or dataset, please cite us:
@misc{shaier2025malamutemultilingualhighlygranulartemplatefree, title={MALAMUTE: A Multilingual, Highly-granular, Template-free, Education-based Probing Dataset}, author={Sagi Shaier and George Arthur Baker and Chiranthan Sridhar and Lawrence E Hunter and Katharina von der Wense}, year={2025}, eprint={2412.10105}, archivePrefix={arXiv}, primaryClass={cs.CL}, url=https://arxiv.org/abs/2412.10105 }