Change the repository type filter
All
Repositories list
41 repositories
RePro
Public[Preprint 2025] Rectifying LLM Thought From Lens of Optimizationopencompass
PublicOpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
SAGA
PublicATLAS
PublicInteractScience
PublicCognitiveKernel-Pro
PublicDeep Research Agent CognitiveKernel-Pro from Tencent AI Lab. Paper: https://arxiv.org/pdf/2508.00414GAOKAO-Eval
Public.github
PublicMMBench-GUI
PublicOfficial repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.ReasonZoo
PublicCompassVerifier
PublicGPassK
Public[ACL 2025] Are Your LLMs Capable of Stable Reasoning?Creation-MMBench
PublicCompassJudger
PublicRaML
PublicBotChat
PublicAda-LEval
PublicMathBench
PublicMMBench
PublicProSA
PublicANAH
PublicGTA
Publicoc_doc_website
PublicCriticEval
Publiclagent-cibench
Publichinode
Publicstorage
PublicCompassBench
Public