An interactive dashboard for analyzing and comparing the performance, cost, and capabilities of leading Large Language Models (LLMs). This tool fetches real-time data from the Hugging Face Open LLM Leaderboard and a custom data repository to provide up-to-date insights.
- Statistics View:
- GPQA Score Timeline: Tracks the evolution of model performance (GPQA benchmark) over time, with interactive logos for major AI organizations.
- Context Window Comparison: A comprehensive bar chart comparing the input and output context lengths of various models.
- Performance vs. Cost Analysis: A scatter plot identifying the most cost-effective models based on their GPQA score and cost per million tokens.
- Comparator View:
- Dynamic Model Selection: Search and select multiple models for a side-by-side comparison.
- Benchmark Breakdown: Compares selected models across key benchmarks like MMLU-Pro, BBH, and MATH.
- CO₂ Impact: Visualizes the carbon cost of training, highlighting the environmental impact relative to model size.
- Detailed Data Table: Provides a sortable and filterable table with raw metrics for in-depth analysis.
- Backend & Dashboard: Python, Dash, Plotly
- Data Processing: Pandas, NumPy
- Data Sources: Hugging Face
datasetsAPI, GitHub API - Performance: In-memory caching with time-based invalidation and parallel data fetching using
ThreadPoolExecutor.
-
Clone the repository:
git clone https://github.com/codebywiam/llm-analytics-dashboard.git cd llm-analytics-dashboard -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Run the application:
python app.py
The app will be available at
http://127.0.0.1:8050.
- Implement a more robust data versioning system instead of relying solely on live APIs.
- Add unit and integration tests for data processing functions.
- Containerize the application with Docker for easier deployment.
This project is licensed under the MIT License. See LICENSE for details.








