We are currently using https://huggingface.co/datasets/princeton-nlp/SWE-bench_Verified/ which contains 500 instances but we should use https://huggingface.co/datasets/eth-sri/SWT-bench_Verified_bm25_27k_zsp/ that contains 433. See docs https://github.com/logic-star-ai/swt-bench/tree/master?tab=readme-ov-file