From 3bcd5ff3c5d58cc6c5bf27b4ccb986d413a672c2 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Thu, 24 Jul 2025 20:02:48 -0400
Subject: [PATCH 01/26] Update README.md
---
README.md | 48 +++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 47 insertions(+), 1 deletion(-)
diff --git a/README.md b/README.md
index c3138ed..421041a 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,48 @@
# Python-Data-Science-Onboarding
-Coming Soon
+
+# Onboarding Tutorial
+
+Welcome to the WMU DSC/Developer Club!
+
+This repository is designed to help new members get familiar with the tools and workflows commonly used in our data science projects.
+
+---
+
+## π Who is this for?
+
+This tutorial assumes you already have *basic Python knowledge*, including:
+
+- Using numpy and pandas for data handling
+- Knowing what a .ipynb Jupyter Notebook file is
+- Using scikit-learn to build simple machine learning models
+
+> *Don't know Python yet?* No problem!
+> Start with the resources below before continuing:
+>
+> - [W3Schools Python Tutorial](https://www.w3schools.com/python/)
+> - [Google's Python Class](https://developers.google.com/edu/python)
+> - [Python for Beginners (YouTube)](https://www.youtube.com/watch?v=K5KVEU3aaeQ&t=56s)
+
+
+ Python Installation Guide (For Beginners)
+
+To follow along with the notebooks in this repository, you need Python installed on your machine.
+
+### π₯ How to Install Python
+
+ [For macOS](https://www.youtube.com/watch?v=nhv82tvFfkM)
+ [For Windows](https://www.youtube.com/watch?v=YagM_FuPLQU)
+
+> π *Important*: During installation, make sure to check:
+> *βAdd Python to PATHβ*
+
+### Verify Your Installation
+
+After installing, open a terminal (or Command Prompt on Windows), and run:
+
+```bash
+python --version
+pip --version
+```
+
+
From ae5999ce70db1b781bbc11b658aa23235b5a0211 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Thu, 24 Jul 2025 20:12:00 -0400
Subject: [PATCH 02/26] Update README.md
---
README.md | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/README.md b/README.md
index 421041a..24fc5b7 100644
--- a/README.md
+++ b/README.md
@@ -46,3 +46,18 @@ pip --version
```
+
+---
+
+## π Core Topics
+
+- πΈ [Data Handling with NumPy & Pandas]([notebooks/tutorial0/tutorial0.ipynb](https://github.com/cereal-with-water/Numpy-Pandas-tutorial))
+ Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames.
+
+- πΈ [Understanding Jupyter Notebooks (`.ipynb`)]([docs/ipynb_guide.md](https://github.com/cereal-with-water/Jupyter-Notebooks-tutorial))
+ What are text vs code cells, how to run them, and best practices for documenting your analysis.
+
+- πΈ [Basic Machine Learning with scikit-learn]([notebooks/tutorial1/tutorial1.ipynb](https://github.com/cereal-with-water/ML-tools-tutorial))
+ Build your first regression and classification models, split data, and evaluate performance.
+
+---
From 178a67e500975434902e4673a368996c8c4c5a0a Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Thu, 24 Jul 2025 20:14:02 -0400
Subject: [PATCH 03/26] Update README.md
---
README.md | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index 24fc5b7..6e449c0 100644
--- a/README.md
+++ b/README.md
@@ -51,13 +51,14 @@ pip --version
## π Core Topics
-- πΈ [Data Handling with NumPy & Pandas]([notebooks/tutorial0/tutorial0.ipynb](https://github.com/cereal-with-water/Numpy-Pandas-tutorial))
+- πΈ [Data Handling with NumPy & Pandas](https://github.com/cereal-with-water/Numpy-Pandas-tutorial)
Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames.
-- πΈ [Understanding Jupyter Notebooks (`.ipynb`)]([docs/ipynb_guide.md](https://github.com/cereal-with-water/Jupyter-Notebooks-tutorial))
+- πΈ [Understanding Jupyter Notebooks (`.ipynb`)](https://github.com/cereal-with-water/Jupyter-Notebooks-tutorial)
What are text vs code cells, how to run them, and best practices for documenting your analysis.
-- πΈ [Basic Machine Learning with scikit-learn]([notebooks/tutorial1/tutorial1.ipynb](https://github.com/cereal-with-water/ML-tools-tutorial))
+- πΈ [Basic Machine Learning with scikit-learn](https://github.com/cereal-with-water/ML-tools-tutorial)
Build your first regression and classification models, split data, and evaluate performance.
---
+
From 5c2a155f1778a1602698c078d573b19c53f28c46 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Thu, 24 Jul 2025 20:27:17 -0400
Subject: [PATCH 04/26] Update README.md
---
README.md | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/README.md b/README.md
index 6e449c0..d22c1db 100644
--- a/README.md
+++ b/README.md
@@ -49,6 +49,32 @@ pip --version
---
+## π¦ Recommended Libraries
+
+In Python, you install packages by running:
+```bash
+pip install
+```
+
+Before you dive into the notebooks, make sure you have the core data-science libraries installed. You can install them all at once via pip:
+
+```bash
+pip install \
+ numpy \
+ pandas \
+ matplotlib \
+ seaborn \
+ scikit-learn \
+ notebook
+```
+
+Or, if you prefer a single command:
+```
+pip install numpy pandas matplotlib seaborn scikit-learn notebook
+```
+
+---
+
## π Core Topics
- πΈ [Data Handling with NumPy & Pandas](https://github.com/cereal-with-water/Numpy-Pandas-tutorial)
From c41ad9bc4832c33a4eacf419322989b99460f4e7 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Tue, 12 Aug 2025 02:38:38 -0400
Subject: [PATCH 05/26] Update README.md
---
README.md | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 140 insertions(+), 8 deletions(-)
diff --git a/README.md b/README.md
index d22c1db..696687b 100644
--- a/README.md
+++ b/README.md
@@ -1,7 +1,5 @@
# Python-Data-Science-Onboarding
-# Onboarding Tutorial
-
Welcome to the WMU DSC/Developer Club!
This repository is designed to help new members get familiar with the tools and workflows commonly used in our data science projects.
@@ -24,7 +22,7 @@ This tutorial assumes you already have *basic Python knowledge*, including:
> - [Python for Beginners (YouTube)](https://www.youtube.com/watch?v=K5KVEU3aaeQ&t=56s)
- Python Installation Guide (For Beginners)
+ π‘Python Installation Guide For Beginners
To follow along with the notebooks in this repository, you need Python installed on your machine.
@@ -77,14 +75,148 @@ pip install numpy pandas matplotlib seaborn scikit-learn notebook
## π Core Topics
-- πΈ [Data Handling with NumPy & Pandas](https://github.com/cereal-with-water/Numpy-Pandas-tutorial)
- Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames.
+
+ π₯Data Handling with NumPy & Pandasπ₯
+ Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames.
+ # Numpy & Pandas
+
+## π Library Overview
+
+Before we dive in, here's a quick intro to the two core libraries weβll use:
+
+### NumPy
+- **The fundamental package for numerical computing in Python.**
+- **Key features:**
+ - **Arrays:** Homogeneous, N-dimensional arrays (faster and more memory-efficient than Python lists)
+ - **Vectorized ops:** Element-wise arithmetic without explicit loops
+ - **Linear algebra & random:** Built-in support for matrix operations and pseudo-random number generation
+
+### Pandas
+- **A powerful data analysis and manipulation library built on top of NumPy.**
+- **Key features:**
+ - **DataFrame:** 2D tabular data structure with labeled axes (rows & columns)
+ - **IO tools:** Read/write CSV, Excel, SQL, JSON, and more
+ - **Series:** 1D labeled array, great for time series and single-column tables
+ - **Grouping & aggregation:** Split-apply-combine workflows for summarizing data
+
+
+
+### 1. What
+> **What you will learn in this section.**
+> By the end of this notebook, you will be able to:
+> - Create and manipulate NumPy arrays of different shapes and dtypes
+> - Perform element-wise arithmetic and universal functions
+> - Index, slice, and reshape arrays for efficient computation
+
+---
+
+### 2. Why
+> **Why this topic matters.**
+> NumPy arrays are the foundation of nearly all scientific computing in Python.
+> They provide:
+> - **Speed:** Vectorized operations run much faster than Python loops
+> - **Memory efficiency:** Compact storage of homogeneous data
+> - **Interoperability:** A common data structure for libraries like Pandas, SciPy, and scikit-learn
+
+---
-- πΈ [Understanding Jupyter Notebooks (`.ipynb`)](https://github.com/cereal-with-water/Jupyter-Notebooks-tutorial)
+### 3. How
+> **How to do it.**
+> Follow these step-by-step examples:
+
+```python
+import numpy as np
+
+# 1) Create arrays
+a = np.array([1, 2, 3, 4])
+b = np.arange(0, 10, 2) # [0, 2, 4, 6, 8]
+c = np.zeros((2, 3), dtype=int) # 2Γ3 array of zeros
+
+# 2) Element-wise arithmetic
+sum_ab = a + b[:4] # adds element by element
+prod_ab = a * b[:4] # multiplies element by element
+
+# 3) Universal functions
+sqrt_b = np.sqrt(b) # square root of each element
+exp_a = np.exp(a) # eα΅ for each element
+
+# 4) Indexing & slicing
+row = b[2:5] # slice subarray
+c[0, :] = row # assign a row
+
+# 5) Reshape & combine
+d = np.linspace(0, 1, 6).reshape(2, 3)
+stacked = np.vstack([c, d]) # vertical stack of two 2Γ3 arrays
+
+
+```
+
+
+
+
+
+
+ π₯Understanding Jupyter Notebooks (.ipynb)π₯
What are text vs code cells, how to run them, and best practices for documenting your analysis.
+ # π Jupyter Notebook Quickstart Guide
-- πΈ [Basic Machine Learning with scikit-learn](https://github.com/cereal-with-water/ML-tools-tutorial)
- Build your first regression and classification models, split data, and evaluate performance.
+This guide will introduce you to Jupyter Notebookβfrom βwhat it isβ to how to install and use it locally or in the cloudβthen walk you through basic operations, hands-on examples, Markdown usage, and sharing.
+
+---
+
+## π What Is Jupyter Notebook?
+
+Jupyter Notebook is an interactive computing environment where you can combine live code, equations, visualizations, and narrative text in a single document (`.ipynb`). Itβs widely used for data analysis, teaching, and rapid prototyping.
+
+- **Key Features**
+ - Interactive code execution
+ - Rich text via Markdown (headings, lists, LaTeX)
+ - Inline data visualizations
+ - Easy sharing and reproducibility
+
+---
+
+## βοΈ Installation & Access
+
+### 1. Install Locally
+
+Youβll need Python installed first. Then:
+
+```bash
+# Install Jupyter Notebook via pip
+pip install notebook
+```
+Or, if you use Conda:
+```bash
+conda install -c conda-forge notebook
+```
+After installation, launch the notebook server:
+```bash
+jupyter notebook
+```
+Your default browser will open at http://localhost:8888, showing the notebook dashboard.
+
+### 2. Use JupyterLab (Optional)
+For a more full-featured interface:
+
+```bash
+pip install jupyterlab
+jupyter lab
+```
+### 3. Cloud / Web Options
+Google Colab
+
+1. Go to colab.research.google.com
+2. Sign in with your Google account
+3. Open or upload any .ipynb file
+
+
+
+
+
+
+ π₯Basic Machine Learning with scikit-learnπ₯
+ Build your first regression and classification models, split data, and evaluate performance.
---
From 53136abf5efff5c92aa4b50346e7dd5141976f4c Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Tue, 12 Aug 2025 03:58:04 -0400
Subject: [PATCH 06/26] Update README.md
---
README.md | 61 +++++++++++++++++++++++++++----------------------------
1 file changed, 30 insertions(+), 31 deletions(-)
diff --git a/README.md b/README.md
index 696687b..addcdc6 100644
--- a/README.md
+++ b/README.md
@@ -1,10 +1,11 @@
# Python-Data-Science-Onboarding
-Welcome to the WMU DSC/Developer Club!
-
+Welcome to the WMU DSC/Developer Club!
This repository is designed to help new members get familiar with the tools and workflows commonly used in our data science projects.
+
+
+
----
## π Who is this for?
@@ -14,23 +15,25 @@ This tutorial assumes you already have *basic Python knowledge*, including:
- Knowing what a .ipynb Jupyter Notebook file is
- Using scikit-learn to build simple machine learning models
-> *Don't know Python yet?* No problem!
-> Start with the resources below before continuing:
->
-> - [W3Schools Python Tutorial](https://www.w3schools.com/python/)
-> - [Google's Python Class](https://developers.google.com/edu/python)
-> - [Python for Beginners (YouTube)](https://www.youtube.com/watch?v=K5KVEU3aaeQ&t=56s)
-
- π‘Python Installation Guide For Beginners
+βDon't know Python yet? No problem!β
+
-To follow along with the notebooks in this repository, you need Python installed on your machine.
+> **Start with the resources below before continuing:**
+> [W3Schools Python Tutorial](https://www.w3schools.com/python/)
+> [Google's Python Class](https://developers.google.com/edu/python)
+> [Python for Beginners (YouTube)](https://www.youtube.com/watch?v=K5KVEU3aaeQ&t=56s)
+
-### π₯ How to Install Python
-
- [For macOS](https://www.youtube.com/watch?v=nhv82tvFfkM)
- [For Windows](https://www.youtube.com/watch?v=YagM_FuPLQU)
+
+ βPython Installation Guide For Beginnersβ
+
+
+> ### To follow along with the notebooks in this repository, you need Python installed on your machine.
+> ### π₯ How to Install Python
+> [For macOS](https://www.youtube.com/watch?v=nhv82tvFfkM)
+> [For Windows](https://www.youtube.com/watch?v=YagM_FuPLQU)
> π *Important*: During installation, make sure to check:
> *βAdd Python to PATHβ*
@@ -42,10 +45,11 @@ After installing, open a terminal (or Command Prompt on Windows), and run:
python --version
pip --version
```
-
+
+
+
----
## π¦ Recommended Libraries
@@ -65,20 +69,16 @@ pip install \
scikit-learn \
notebook
```
+
+
-Or, if you prefer a single command:
-```
-pip install numpy pandas matplotlib seaborn scikit-learn notebook
-```
----
## π Core Topics
π₯Data Handling with NumPy & Pandasπ₯
Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames.
- # Numpy & Pandas
## π Library Overview
@@ -150,15 +150,14 @@ stacked = np.vstack([c, d]) # vertical stack of two 2Γ3 arrays
```
+
-
-
π₯Understanding Jupyter Notebooks (.ipynb)π₯
- What are text vs code cells, how to run them, and best practices for documenting your analysis.
- # π Jupyter Notebook Quickstart Guide
+What are text vs code cells, how to run them, and best practices for documenting your analysis.
+# π Jupyter Notebook Quickstart Guide
This guide will introduce you to Jupyter Notebookβfrom βwhat it isβ to how to install and use it locally or in the cloudβthen walk you through basic operations, hands-on examples, Markdown usage, and sharing.
@@ -209,14 +208,14 @@ Google Colab
1. Go to colab.research.google.com
2. Sign in with your Google account
3. Open or upload any .ipynb file
+
-
-
π₯Basic Machine Learning with scikit-learnπ₯
- Build your first regression and classification models, split data, and evaluate performance.
+ Build your first regression and classification models, split data, and evaluate performance.
+
---
From 395646502f2eca498eb5c6177a2c30d7920b17f2 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Tue, 12 Aug 2025 03:58:27 -0400
Subject: [PATCH 07/26] Update README.md
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index addcdc6..f68d1d2 100644
--- a/README.md
+++ b/README.md
@@ -217,5 +217,5 @@ Google Colab
Build your first regression and classification models, split data, and evaluate performance.
----
+
From 7db8abb6b4d97312eb5dc64a82a3883cb0173d11 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Tue, 12 Aug 2025 22:52:27 -0400
Subject: [PATCH 08/26] Libraries
---
libraries | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
create mode 100644 libraries
diff --git a/libraries b/libraries
new file mode 100644
index 0000000..816ac87
--- /dev/null
+++ b/libraries
@@ -0,0 +1,45 @@
+# π Top 26 Python Libraries for Data Science
+
+## Staple Python Libraries for Data Science
+1. **NumPy** β Core numerical computing library in Python, offering fast operations on multi-dimensional arrays and matrices, essential for scientific computing and linear algebra.
+2. **pandas** β Powerful data analysis/manipulation tool providing DataFrame structures, easy I/O with multiple file formats, and advanced indexing, grouping, and time series functionality.
+3. **Matplotlib** β Fundamental plotting library for creating static, interactive, and animated visualizations with full customization.
+4. **Seaborn** β High-level statistical visualization library built on Matplotlib, offering attractive and informative default styles for complex plots.
+5. **Plotly** β Interactive graphing library for web-based visualizations, supporting 3D charts and dashboards via Dash.
+6. **scikit-learn** β Comprehensive machine learning library for classification, regression, clustering, and preprocessing, with a consistent API.
+
+---
+
+## Machine Learning Python Libraries
+7. **LightGBM** β Gradient boosting framework optimized for speed, memory efficiency, and accuracy, supporting large-scale and GPU-based learning.
+8. **XGBoost** β Widely used gradient boosting library known for performance in Kaggle competitions, supporting distributed training and multiple platforms.
+9. **CatBoost** β High-performance gradient boosting library with strong categorical feature handling and excellent CPU/GPU support.
+10. **Statsmodels** β Statistical modeling library for regression, hypothesis testing, and time series analysis, with an R-like interface.
+11. **RAPIDS cuDF/cuML** β NVIDIA GPU-accelerated libraries for DataFrame manipulation (cuDF) and machine learning (cuML) with pandas- and scikit-learn-like APIs.
+12. **Optuna** β Hyperparameter optimization framework with efficient algorithms, pruning, and visualization tools.
+
+---
+
+## Automated Machine Learning (AutoML) Python Libraries
+13. **PyCaret** β Low-code machine learning library automating the end-to-end ML workflow for rapid experimentation.
+14. **H2O** β Scalable ML platform for big data, supporting distributed computing and AutoML.
+15. **TPOT** β AutoML tool using genetic programming to optimize ML pipelines automatically.
+16. **auto-sklearn** β Automated model selection and hyperparameter tuning built on scikit-learn with Bayesian optimization.
+17. **FLAML** β Lightweight AutoML library focused on finding accurate models quickly with minimal computational cost.
+
+---
+
+## Deep Learning Python Libraries
+18. **TensorFlow** β Googleβs open-source ML framework for scalable deep learning, offering APIs for building, training, and deploying models.
+19. **PyTorch** β Facebookβs deep learning framework known for dynamic computation graphs, ease of use, and strong research-to-production transition.
+20. **FastAI** β High-level deep learning library on PyTorch with concise APIs for state-of-the-art results.
+21. **Keras** β User-friendly deep learning API integrated with TensorFlow, designed for quick prototyping and experimentation.
+22. **PyTorch Lightning** β Lightweight wrapper for PyTorch that organizes code for reproducibility and scalability.
+
+---
+
+## Natural Language Processing Python Libraries
+23. **NLTK** β Comprehensive NLP toolkit for tokenization, parsing, and linguistic processing with access to corpora like WordNet.
+24. **spaCy** β Industrial-strength NLP library for large-scale text processing, supporting deep learning integration and 60+ languages.
+25. **Gensim** β Topic modeling and vector space modeling library optimized for large corpora and memory efficiency.
+26. **Hugging Face Transformers** β Library for state-of-the-art transformer-based models for text, vision, audio, and multimodal tasks, supporting PyTorch, TensorFlow, and JAX.
From ee1046a9459d6ce4deeffb3a76e51c20e9588605 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Tue, 12 Aug 2025 22:54:08 -0400
Subject: [PATCH 09/26] Libraries
---
libraries => libraries.md | 0
1 file changed, 0 insertions(+), 0 deletions(-)
rename libraries => libraries.md (100%)
diff --git a/libraries b/libraries.md
similarity index 100%
rename from libraries
rename to libraries.md
From 0f42452004e608fc565aa6b238f83e0177786264 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Tue, 12 Aug 2025 22:58:33 -0400
Subject: [PATCH 10/26] Update libraries.md
---
libraries.md | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/libraries.md b/libraries.md
index 816ac87..6744b24 100644
--- a/libraries.md
+++ b/libraries.md
@@ -1,5 +1,7 @@
# π Top 26 Python Libraries for Data Science
+
+
## Staple Python Libraries for Data Science
1. **NumPy** β Core numerical computing library in Python, offering fast operations on multi-dimensional arrays and matrices, essential for scientific computing and linear algebra.
2. **pandas** β Powerful data analysis/manipulation tool providing DataFrame structures, easy I/O with multiple file formats, and advanced indexing, grouping, and time series functionality.
@@ -8,26 +10,29 @@
5. **Plotly** β Interactive graphing library for web-based visualizations, supporting 3D charts and dashboards via Dash.
6. **scikit-learn** β Comprehensive machine learning library for classification, regression, clustering, and preprocessing, with a consistent API.
----
+
+
## Machine Learning Python Libraries
7. **LightGBM** β Gradient boosting framework optimized for speed, memory efficiency, and accuracy, supporting large-scale and GPU-based learning.
8. **XGBoost** β Widely used gradient boosting library known for performance in Kaggle competitions, supporting distributed training and multiple platforms.
9. **CatBoost** β High-performance gradient boosting library with strong categorical feature handling and excellent CPU/GPU support.
10. **Statsmodels** β Statistical modeling library for regression, hypothesis testing, and time series analysis, with an R-like interface.
-11. **RAPIDS cuDF/cuML** β NVIDIA GPU-accelerated libraries for DataFrame manipulation (cuDF) and machine learning (cuML) with pandas- and scikit-learn-like APIs.
+11. **RAPIDS cuDF/cuML** β NVIDIA GPU-accelerated libraries for DataFrame manipulation and machine learning with pandas- and scikit-learn-like APIs.
12. **Optuna** β Hyperparameter optimization framework with efficient algorithms, pruning, and visualization tools.
----
+
-## Automated Machine Learning (AutoML) Python Libraries
+
+## Automated Machine Learning Python Libraries
13. **PyCaret** β Low-code machine learning library automating the end-to-end ML workflow for rapid experimentation.
14. **H2O** β Scalable ML platform for big data, supporting distributed computing and AutoML.
15. **TPOT** β AutoML tool using genetic programming to optimize ML pipelines automatically.
16. **auto-sklearn** β Automated model selection and hyperparameter tuning built on scikit-learn with Bayesian optimization.
17. **FLAML** β Lightweight AutoML library focused on finding accurate models quickly with minimal computational cost.
----
+
+
## Deep Learning Python Libraries
18. **TensorFlow** β Googleβs open-source ML framework for scalable deep learning, offering APIs for building, training, and deploying models.
@@ -36,7 +41,8 @@
21. **Keras** β User-friendly deep learning API integrated with TensorFlow, designed for quick prototyping and experimentation.
22. **PyTorch Lightning** β Lightweight wrapper for PyTorch that organizes code for reproducibility and scalability.
----
+
+
## Natural Language Processing Python Libraries
23. **NLTK** β Comprehensive NLP toolkit for tokenization, parsing, and linguistic processing with access to corpora like WordNet.
From 9823fdae28a04ab48c87cc5bbd2184005e5d42c7 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Wed, 13 Aug 2025 00:17:56 -0400
Subject: [PATCH 11/26] Update README.md
---
README.md | 116 +++++++++++++++++++++++++++---------------------------
1 file changed, 58 insertions(+), 58 deletions(-)
diff --git a/README.md b/README.md
index f68d1d2..cff496d 100644
--- a/README.md
+++ b/README.md
@@ -76,6 +76,64 @@ pip install \
## π Core Topics
+
+ π₯Understanding Jupyter Notebooks (.ipynb)π₯
+What are text vs code cells, how to run them, and best practices for documenting your analysis.
+# π Jupyter Notebook Quickstart Guide
+
+This guide will introduce you to Jupyter Notebookβfrom βwhat it isβ to how to install and use it locally or in the cloudβthen walk you through basic operations, hands-on examples, Markdown usage, and sharing.
+
+---
+
+## π What Is Jupyter Notebook?
+
+Jupyter Notebook is an interactive computing environment where you can combine live code, equations, visualizations, and narrative text in a single document (`.ipynb`). Itβs widely used for data analysis, teaching, and rapid prototyping.
+
+- **Key Features**
+ - Interactive code execution
+ - Rich text via Markdown (headings, lists, LaTeX)
+ - Inline data visualizations
+ - Easy sharing and reproducibility
+
+---
+
+## βοΈ Installation & Access
+
+### 1. Install Locally
+
+Youβll need Python installed first. Then:
+
+```bash
+# Install Jupyter Notebook via pip
+pip install notebook
+```
+Or, if you use Conda:
+```bash
+conda install -c conda-forge notebook
+```
+After installation, launch the notebook server:
+```bash
+jupyter notebook
+```
+Your default browser will open at http://localhost:8888, showing the notebook dashboard.
+
+### 2. Use JupyterLab (Optional)
+For a more full-featured interface:
+
+```bash
+pip install jupyterlab
+jupyter lab
+```
+### 3. Cloud / Web Options
+Google Colab
+
+1. Go to colab.research.google.com
+2. Sign in with your Google account
+3. Open or upload any .ipynb file
+
+
+
+
π₯Data Handling with NumPy & Pandasπ₯
Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames.
@@ -154,64 +212,6 @@ stacked = np.vstack([c, d]) # vertical stack of two 2Γ3 arrays
-
- π₯Understanding Jupyter Notebooks (.ipynb)π₯
-What are text vs code cells, how to run them, and best practices for documenting your analysis.
-# π Jupyter Notebook Quickstart Guide
-
-This guide will introduce you to Jupyter Notebookβfrom βwhat it isβ to how to install and use it locally or in the cloudβthen walk you through basic operations, hands-on examples, Markdown usage, and sharing.
-
----
-
-## π What Is Jupyter Notebook?
-
-Jupyter Notebook is an interactive computing environment where you can combine live code, equations, visualizations, and narrative text in a single document (`.ipynb`). Itβs widely used for data analysis, teaching, and rapid prototyping.
-
-- **Key Features**
- - Interactive code execution
- - Rich text via Markdown (headings, lists, LaTeX)
- - Inline data visualizations
- - Easy sharing and reproducibility
-
----
-
-## βοΈ Installation & Access
-
-### 1. Install Locally
-
-Youβll need Python installed first. Then:
-
-```bash
-# Install Jupyter Notebook via pip
-pip install notebook
-```
-Or, if you use Conda:
-```bash
-conda install -c conda-forge notebook
-```
-After installation, launch the notebook server:
-```bash
-jupyter notebook
-```
-Your default browser will open at http://localhost:8888, showing the notebook dashboard.
-
-### 2. Use JupyterLab (Optional)
-For a more full-featured interface:
-
-```bash
-pip install jupyterlab
-jupyter lab
-```
-### 3. Cloud / Web Options
-Google Colab
-
-1. Go to colab.research.google.com
-2. Sign in with your Google account
-3. Open or upload any .ipynb file
-
-
-
-
π₯Basic Machine Learning with scikit-learnπ₯
Build your first regression and classification models, split data, and evaluate performance.
From 9a2f831c24713564b93c6ae9eead5bc42fcbed2f Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Mon, 18 Aug 2025 15:32:08 -0400
Subject: [PATCH 12/26] Create man.md
---
tutorial/man.md | 1 +
1 file changed, 1 insertion(+)
create mode 100644 tutorial/man.md
diff --git a/tutorial/man.md b/tutorial/man.md
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/tutorial/man.md
@@ -0,0 +1 @@
+
From c800a1fa986ba7d585bfc52f1ba70f2a11085583 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Mon, 18 Aug 2025 16:05:43 -0400
Subject: [PATCH 13/26] Delete tutorial/man.md
---
tutorial/man.md | 1 -
1 file changed, 1 deletion(-)
delete mode 100644 tutorial/man.md
diff --git a/tutorial/man.md b/tutorial/man.md
deleted file mode 100644
index 8b13789..0000000
--- a/tutorial/man.md
+++ /dev/null
@@ -1 +0,0 @@
-
From ceb0bf1d7aea6ceab7a518e8377439e2fa99d2d2 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Mon, 18 Aug 2025 16:59:36 -0400
Subject: [PATCH 14/26] Create 01_numpy_basics.md
---
checkpoints/01_numpy_basics.md | 71 ++++++++++++++++++++++++++++++++++
1 file changed, 71 insertions(+)
create mode 100644 checkpoints/01_numpy_basics.md
diff --git a/checkpoints/01_numpy_basics.md b/checkpoints/01_numpy_basics.md
new file mode 100644
index 0000000..594e7bc
--- /dev/null
+++ b/checkpoints/01_numpy_basics.md
@@ -0,0 +1,71 @@
+# β
Checkpoint 01 β NumPy Basics
+
+**Goal**
+- Create/reshape arrays, vectorized ops, boolean masking
+
+**Rules**
+- Fill only where marked as `# TODO`
+- Do not change test cells (π)
+- Run all cells before submitting
+
+**References**
+- NumPy docs: https://numpy.org/doc/
+
+---
+
+```python
+# π§ Setup
+import numpy as np
+import pandas as pd
+from utils.grader import check_array, check_value
+
+np.random.seed(42)
+```
+
+
+# Q1) Create a 3x3 array with values 0..8 (row-major)
+# TODO: assign to variable 'A'
+A = ... # TODO
+
+# π Test
+check_array(A, shape=(3,3), dtype=np.integer)
+check_value(A.sum(), 36)
+
+# Q2) From A, create a boolean mask selecting even numbers
+# TODO: assign to variable 'mask_even'
+mask_even = ... # TODO
+
+# π Test
+check_array(mask_even, shape=(3,3), dtype=bool)
+check_value(int(mask_even.sum()), 5) # number of evens in 0..8
+
+# Q3) Reshape, stack, and compute row-wise means β 'means'
+# TODO: assign to variable 'means' (1D array length 3)
+B = ... # TODO
+means = ... # TODO
+
+# π Test
+check_array(means, shape=(3,))
+
+
+# Q4) Broadcasting: A (3x3) and v (1x3) β 'C'
+v = np.array([10, 0, -10])
+C = ... # TODO
+
+# π Test
+check_array(C, shape=(3,3), dtype=np.integer)
+check_value(int(C[0,0] + C[2,2]), (A[0,0]+10) + (A[2,2]-10))
+
+
+# Q5) Fancy indexing / boolean masking
+# Extract odd numbers β₯ 3 from A β 'odd_ge3'
+odd_ge3 = ... # TODO
+
+# π Test
+check_array(
+ odd_ge3,
+ shape=(np.count_nonzero((A>=3)&(A%2==1)),),
+ dtype=np.integer,
+ allow_int_any=True
+)
+check_value(int(odd_ge3.min()), 3)
From 8d2330cd09b2a35d9f0199b8ca47041d0c2a0972 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Thu, 21 Aug 2025 16:52:53 -0400
Subject: [PATCH 15/26] Add files via upload
---
checkpoints/01_numpy_basics.ipynb | 137 ++++++++++++++++++++++++++++++
1 file changed, 137 insertions(+)
create mode 100644 checkpoints/01_numpy_basics.ipynb
diff --git a/checkpoints/01_numpy_basics.ipynb b/checkpoints/01_numpy_basics.ipynb
new file mode 100644
index 0000000..d237f78
--- /dev/null
+++ b/checkpoints/01_numpy_basics.ipynb
@@ -0,0 +1,137 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# β
Checkpoint 01 β NumPy Basics\n\n",
+ "**Goal**\n",
+ "- Create/reshape arrays, vectorized ops, boolean masking\n\n",
+ "**Rules**\n",
+ "- Fill only where marked as `# TODO`\n",
+ "- Do not change test cells (π)\n",
+ "- Run all cells before submitting\n\n",
+ "**References**\n",
+ "- NumPy docs: https://numpy.org/doc/\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# π§ Setup\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "from utils.grader import check_array, check_value\n\n",
+ "np.random.seed(42)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Q1) Create a 3x3 array with values 0..8 (row-major)\n",
+ "# TODO: assign to variable 'A'\n",
+ "A = ... # TODO\n\n",
+ "# π Test\n",
+ "check_array(A, shape=(3,3), dtype=np.integer)\n",
+ "check_value(A.sum(), 36)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Q2) From A, create a boolean mask selecting even numbers\n",
+ "# TODO: assign to variable 'mask_even'\n",
+ "mask_even = ... # TODO\n\n",
+ "# π Test\n",
+ "check_array(mask_even, shape=(3,3), dtype=bool)\n",
+ "check_value(int(mask_even.sum()), 5) # number of evens in 0..8\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Q3) Reshape, stack, and compute row-wise means β 'means'\n",
+ "# TODO: assign to variable 'means' (1D array length 3)\n",
+ "B = ... # TODO\n",
+ "means = ... # TODO\n\n",
+ "# π Test\n",
+ "check_array(means, shape=(3,))\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Q4) Broadcasting: A (3x3) and v (1x3) β 'C'\n",
+ "v = np.array([10, 0, -10])\n",
+ "C = ... # TODO\n\n",
+ "# π Test\n",
+ "check_array(C, shape=(3,3), dtype=np.integer)\n",
+ "check_value(int(C[0,0] + C[2,2]), (A[0,0]+10) + (A[2,2]-10))\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Q5) Fancy indexing / boolean masking\n",
+ "# Extract odd numbers β₯ 3 from A β 'odd_ge3'\n",
+ "odd_ge3 = ... # TODO\n\n",
+ "# π Test\n",
+ "check_array(\n",
+ " odd_ge3,\n",
+ " shape=(np.count_nonzero((A>=3)&(A%2==1)),),\n",
+ " dtype=np.integer,\n",
+ " allow_int_any=True\n",
+ ")\n",
+ "check_value(int(odd_ge3.min()), 3)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### β
Submit\n",
+ "- All tests above passed\n",
+ "- Save notebook and commit to your repo\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python",
+ "version": "3.11.8",
+ "mimetype": "text/x-python",
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "pygments_lexer": "ipython3",
+ "nbconvert_exporter": "python",
+ "file_extension": ".py"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
From 8d83324a8f45a2771d87886a43812ccc452c2ea3 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Thu, 21 Aug 2025 17:39:13 -0400
Subject: [PATCH 16/26] Delete checkpoints/01_numpy_basics.md
---
checkpoints/01_numpy_basics.md | 71 ----------------------------------
1 file changed, 71 deletions(-)
delete mode 100644 checkpoints/01_numpy_basics.md
diff --git a/checkpoints/01_numpy_basics.md b/checkpoints/01_numpy_basics.md
deleted file mode 100644
index 594e7bc..0000000
--- a/checkpoints/01_numpy_basics.md
+++ /dev/null
@@ -1,71 +0,0 @@
-# β
Checkpoint 01 β NumPy Basics
-
-**Goal**
-- Create/reshape arrays, vectorized ops, boolean masking
-
-**Rules**
-- Fill only where marked as `# TODO`
-- Do not change test cells (π)
-- Run all cells before submitting
-
-**References**
-- NumPy docs: https://numpy.org/doc/
-
----
-
-```python
-# π§ Setup
-import numpy as np
-import pandas as pd
-from utils.grader import check_array, check_value
-
-np.random.seed(42)
-```
-
-
-# Q1) Create a 3x3 array with values 0..8 (row-major)
-# TODO: assign to variable 'A'
-A = ... # TODO
-
-# π Test
-check_array(A, shape=(3,3), dtype=np.integer)
-check_value(A.sum(), 36)
-
-# Q2) From A, create a boolean mask selecting even numbers
-# TODO: assign to variable 'mask_even'
-mask_even = ... # TODO
-
-# π Test
-check_array(mask_even, shape=(3,3), dtype=bool)
-check_value(int(mask_even.sum()), 5) # number of evens in 0..8
-
-# Q3) Reshape, stack, and compute row-wise means β 'means'
-# TODO: assign to variable 'means' (1D array length 3)
-B = ... # TODO
-means = ... # TODO
-
-# π Test
-check_array(means, shape=(3,))
-
-
-# Q4) Broadcasting: A (3x3) and v (1x3) β 'C'
-v = np.array([10, 0, -10])
-C = ... # TODO
-
-# π Test
-check_array(C, shape=(3,3), dtype=np.integer)
-check_value(int(C[0,0] + C[2,2]), (A[0,0]+10) + (A[2,2]-10))
-
-
-# Q5) Fancy indexing / boolean masking
-# Extract odd numbers β₯ 3 from A β 'odd_ge3'
-odd_ge3 = ... # TODO
-
-# π Test
-check_array(
- odd_ge3,
- shape=(np.count_nonzero((A>=3)&(A%2==1)),),
- dtype=np.integer,
- allow_int_any=True
-)
-check_value(int(odd_ge3.min()), 3)
From fb886093082f6e868c2b23df076b48f673846c59 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Thu, 21 Aug 2025 19:51:05 -0400
Subject: [PATCH 17/26] Create grader
---
checkpoints/utils/grader | 1 +
1 file changed, 1 insertion(+)
create mode 100644 checkpoints/utils/grader
diff --git a/checkpoints/utils/grader b/checkpoints/utils/grader
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/checkpoints/utils/grader
@@ -0,0 +1 @@
+
From ef40b2a489aaaeaaaf423397a5bd443d2c5f811d Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Thu, 21 Aug 2025 19:51:32 -0400
Subject: [PATCH 18/26] Add files via upload
---
checkpoints/utils/grader.py | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
create mode 100644 checkpoints/utils/grader.py
diff --git a/checkpoints/utils/grader.py b/checkpoints/utils/grader.py
new file mode 100644
index 0000000..e496ae3
--- /dev/null
+++ b/checkpoints/utils/grader.py
@@ -0,0 +1,18 @@
+import numpy as np
+
+def check_array(arr, shape=None, dtype=None, allow_int_any=False):
+ if not isinstance(arr, np.ndarray):
+ raise AssertionError(f"β Expected numpy.ndarray, got {type(arr)}")
+ if shape is not None and arr.shape != shape:
+ raise AssertionError(f"β Wrong shape: expected {shape}, got {arr.shape}")
+ if dtype is not None:
+ if allow_int_any and np.issubdtype(arr.dtype, np.integer):
+ pass
+ elif not np.issubdtype(arr.dtype, dtype):
+ raise AssertionError(f"β Wrong dtype: expected {dtype}, got {arr.dtype}")
+ print("β
Array check passed.")
+
+def check_value(val, expected):
+ if val != expected:
+ raise AssertionError(f"β Wrong value: expected {expected}, got {val}")
+ print("β
Value check passed.")
From e876d0fd9769a88dc010924a9a20b01586bcb5fc Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Thu, 21 Aug 2025 19:51:50 -0400
Subject: [PATCH 19/26] Delete checkpoints/utils/grader
---
checkpoints/utils/grader | 1 -
1 file changed, 1 deletion(-)
delete mode 100644 checkpoints/utils/grader
diff --git a/checkpoints/utils/grader b/checkpoints/utils/grader
deleted file mode 100644
index 8b13789..0000000
--- a/checkpoints/utils/grader
+++ /dev/null
@@ -1 +0,0 @@
-
From a56a159a9f300b3277de20b8e4a113ace9a09186 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Sat, 23 Aug 2025 19:24:31 -0400
Subject: [PATCH 20/26] Create 02_pandas_basics.ipynb
---
checkpoints/02_pandas_basics.ipynb | 167 +++++++++++++++++++++++++++++
1 file changed, 167 insertions(+)
create mode 100644 checkpoints/02_pandas_basics.ipynb
diff --git a/checkpoints/02_pandas_basics.ipynb b/checkpoints/02_pandas_basics.ipynb
new file mode 100644
index 0000000..839870c
--- /dev/null
+++ b/checkpoints/02_pandas_basics.ipynb
@@ -0,0 +1,167 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# β
Checkpoint 02 β pandas Basics\n",
+ "\n",
+ "**Goal**\n",
+ "- Load/create DataFrames, filter & sort, add computed columns, groupby/aggregate, and merge.\n",
+ "\n",
+ "**Rules**\n",
+ "- Fill only where marked as `# TODO`\n",
+ "- Do not change test cells (π)\n",
+ "- Run all cells before submitting\n",
+ "\n",
+ "**References**\n",
+ "- pandas docs: https://pandas.pydata.org/docs/\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# π§ Setup\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "from utils.grader import (\n",
+ " check_array, check_value, check_dataframe_columns,\n",
+ " check_series_index_values, check_len\n",
+ ")\n",
+ "np.random.seed(42)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Small in-memory data we'll use throughout\n",
+ "data = {\n",
+ " 'city': ['Ann Arbor','Kalamazoo','Detroit','Grand Rapids','Lansing'],\n",
+ " 'temp_f': [68, 77, 59, 90, 82],\n",
+ " 'rain': [False, True, False, False, True],\n",
+ " 'date': pd.to_datetime(['2025-08-20','2025-08-20','2025-08-20','2025-08-20','2025-08-20'])\n",
+ "}\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Q1) Create a DataFrame 'df' from the dict 'data' with columns in order: city, temp_f, rain, date\n",
+ "# TODO: assign to variable 'df'\n",
+ "df = ... # TODO\n",
+ "\n",
+ "# π Test\n",
+ "check_dataframe_columns(df, ['city','temp_f','rain','date'])\n",
+ "check_value(df.iloc[0]['city'], 'Ann Arbor')\n",
+ "check_len(df, 5)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Q2) Filter rows where rain == False, sort by temp_f descending, reset index β 'df_dry'\n",
+ "# TODO: assign to variable 'df_dry'\n",
+ "df_dry = ... # TODO\n",
+ "\n",
+ "# π Test\n",
+ "check_len(df_dry, 3)\n",
+ "check_value(df_dry.iloc[0]['temp_f'], 90)\n",
+ "check_dataframe_columns(df_dry, ['city','temp_f','rain','date'])\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Q3) Add a Celsius column: temp_c = round((temp_f - 32) * 5/9, 1)\n",
+ "# TODO: create 'temp_c' column on df\n",
+ "...\n",
+ "\n",
+ "# π Test\n",
+ "check_value(float(df.loc[df['city']=='Grand Rapids','temp_c'].iloc[0]), round((90-32)*5/9,1))\n",
+ "check_dataframe_columns(df, ['city','temp_f','rain','date','temp_c'])\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Q4) Group by 'rain' and compute mean temp_c β 'avg_temp_by_rain' (Series indexed by rain boolean)\n",
+ "# TODO: assign to variable 'avg_temp_by_rain'\n",
+ "avg_temp_by_rain = ... # TODO\n",
+ "\n",
+ "# π Test (values checked approximately)\n",
+ "check_series_index_values(avg_temp_by_rain, {False, True})\n",
+ "mean_false = avg_temp_by_rain.loc[False]\n",
+ "mean_true = avg_temp_by_rain.loc[True]\n",
+ "check_value(round(float(mean_false),1), round(((68-32)*5/9 + (59-32)*5/9 + (90-32)*5/9)/3, 1))\n",
+ "check_value(round(float(mean_true),1), round(((77-32)*5/9 + (82-32)*5/9)/2, 1))\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Q5) Merge: create a DataFrame 'city_region' with columns city and region, then left-merge onto df β 'df_merged'\n",
+ "city_region = pd.DataFrame({\n",
+ " 'city': ['Ann Arbor','Kalamazoo','Detroit','Grand Rapids','Lansing'],\n",
+ " 'region': ['SE','SW','SE','W','C']\n",
+ "})\n",
+ "# TODO: left-merge on 'city' to produce df_merged\n",
+ "df_merged = ... # TODO\n",
+ "\n",
+ "# π Test\n",
+ "check_dataframe_columns(df_merged, ['city','temp_f','rain','date','temp_c','region'])\n",
+ "check_value(set(df_merged['region']), {'SE','SW','W','C'})\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### β
Submit\n",
+ "- All tests above passed\n",
+ "- Save notebook and commit to your repo\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python",
+ "version": "3.11",
+ "mimetype": "text/x-python",
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "pygments_lexer": "ipython3",
+ "nbconvert_exporter": "python",
+ "file_extension": ".py"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
From d4a12100ff90748fb3fdfff4bb6bb996fa83d6fd Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Sun, 24 Aug 2025 20:29:26 -0400
Subject: [PATCH 21/26] Update grader.py
---
checkpoints/utils/grader.py | 46 ++++++++++++++++++++++++++++++++-----
1 file changed, 40 insertions(+), 6 deletions(-)
diff --git a/checkpoints/utils/grader.py b/checkpoints/utils/grader.py
index e496ae3..6237e45 100644
--- a/checkpoints/utils/grader.py
+++ b/checkpoints/utils/grader.py
@@ -1,18 +1,52 @@
import numpy as np
+import pandas as pd
+def _fail(msg):
+ raise AssertionError(msg)
+
+# Numpy
def check_array(arr, shape=None, dtype=None, allow_int_any=False):
if not isinstance(arr, np.ndarray):
- raise AssertionError(f"β Expected numpy.ndarray, got {type(arr)}")
+ _fail(f"β Expected numpy.ndarray, got {type(arr)}")
if shape is not None and arr.shape != shape:
- raise AssertionError(f"β Wrong shape: expected {shape}, got {arr.shape}")
+ _fail(f"β Wrong shape: expected {shape}, got {arr.shape}")
if dtype is not None:
if allow_int_any and np.issubdtype(arr.dtype, np.integer):
pass
elif not np.issubdtype(arr.dtype, dtype):
- raise AssertionError(f"β Wrong dtype: expected {dtype}, got {arr.dtype}")
+ _fail(f"β Wrong dtype: expected {dtype}, got {arr.dtype}")
print("β
Array check passed.")
-def check_value(val, expected):
- if val != expected:
- raise AssertionError(f"β Wrong value: expected {expected}, got {val}")
+def check_value(val, expected, tol=1e-8):
+ if isinstance(val, (float, np.floating)) or isinstance(expected, (float, np.floating)):
+ if abs(float(val) - float(expected)) > tol:
+ _fail(f"β Wrong value: expected {expected}, got {val}")
+ else:
+ if val != expected:
+ _fail(f"β Wrong value: expected {expected}, got {val}")
print("β
Value check passed.")
+
+# pandas
+def check_dataframe_columns(df, expected_cols):
+ if not isinstance(df, pd.DataFrame):
+ _fail(f"β Expected pandas.DataFrame, got {type(df)}")
+ missing = [c for c in expected_cols if c not in df.columns]
+ if missing:
+ _fail(f"β Missing columns: {missing}")
+ print("β
DataFrame columns check passed.")
+
+def check_series_index_values(s, expected_index_set):
+ if not isinstance(s, pd.Series):
+ _fail(f"β Expected pandas.Series, got {type(s)}")
+ if set(list(s.index)) != set(list(expected_index_set)):
+ _fail(f"β Unexpected index: got {list(s.index)}, expected set {list(expected_index_set)}")
+ print("β
Series index check passed.")
+
+def check_len(obj, expected_len):
+ try:
+ n = len(obj)
+ except Exception as e:
+ _fail(f"β Object has no len(): {e}")
+ if n != expected_len:
+ _fail(f"β Wrong length: expected {expected_len}, got {n}")
+ print("β
Length check passed.")
From ba151f93e62b6d25075f6206d9936dd52644f7c3 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Sun, 24 Aug 2025 20:36:02 -0400
Subject: [PATCH 22/26] Create 03_matplotlib_seaborn.ipynb
---
checkpoints/03_matplotlib_seaborn.ipynb | 237 ++++++++++++++++++++++++
1 file changed, 237 insertions(+)
create mode 100644 checkpoints/03_matplotlib_seaborn.ipynb
diff --git a/checkpoints/03_matplotlib_seaborn.ipynb b/checkpoints/03_matplotlib_seaborn.ipynb
new file mode 100644
index 0000000..d87d586
--- /dev/null
+++ b/checkpoints/03_matplotlib_seaborn.ipynb
@@ -0,0 +1,237 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# β
Checkpoint 03 β Matplotlib & Seaborn\n",
+ "\n",
+ "**Goal**\n",
+ "- Create basic plots with Matplotlib & Seaborn: scatter, histogram, boxplot, and aggregated barplot.\n",
+ "- Set titles/labels properly and export figures as files.\n",
+ "\n",
+ "**Rules**\n",
+ "- Fill only where marked as `# TODO`\n",
+ "- Do not change test cells (π)\n",
+ "- Run all cells in order before submitting\n",
+ "\n",
+ "**References**\n",
+ "- Matplotlib docs: https://matplotlib.org/stable/\n",
+ "- Seaborn docs: https://seaborn.pydata.org/\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# π§ Setup\n",
+ "import os\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "import seaborn as sns\n",
+ "from utils.grader import (\n",
+ " check_value, check_len, check_file_exists,\n",
+ " check_axes_instance, check_xlabel, check_ylabel, check_title_contains,\n",
+ " check_num_lines, check_num_collections, check_num_patches\n",
+ ")\n",
+ "np.random.seed(42)\n",
+ "\n",
+ "# ensure output dir\nn",
+ "os.makedirs('outputs', exist_ok=True)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# small synthetic dataset (deterministic)\n",
+ "n = 120\n",
+ "days = np.random.choice(['Thur','Fri','Sat','Sun'], size=n, p=[0.25,0.2,0.3,0.25])\n",
+ "sex = np.random.choice(['Male','Female'], size=n)\n",
+ "smoker = np.random.choice(['Yes','No'], size=n, p=[0.3,0.7])\n",
+ "total_bill = np.round(np.random.normal(loc=24, scale=8, size=n).clip(5, 80), 2)\n",
+ "tip = np.round((total_bill * np.random.uniform(0.08, 0.22, size=n)), 2)\n",
+ "\n",
+ "df = pd.DataFrame({\n",
+ " 'day': days,\n",
+ " 'sex': sex,\n",
+ " 'smoker': smoker,\n",
+ " 'total_bill': total_bill,\n",
+ " 'tip': tip\n",
+ "})\n",
+ "df.head()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Q1) Matplotlib Scatter\n",
+ "Create a scatter plot of `total_bill` (x) vs `tip` (y) using **Matplotlib**.\n",
+ "- Put the **x label**: `Total Bill ($)`\n",
+ "- Put the **y label**: `Tip ($)`\n",
+ "- Title should contain the word **\"Scatter\"**\n",
+ "- Save the fig object in a variable named **`fig1`**, axes in **`ax1`**\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# TODO: create fig1, ax1, draw scatter, set labels and title\n",
+ "fig1, ax1 = ... # TODO\n",
+ "\n",
+ "# π Test\n",
+ "check_axes_instance(ax1)\n",
+ "check_xlabel(ax1, 'Total Bill ($)')\n",
+ "check_ylabel(ax1, 'Tip ($)')\n",
+ "check_title_contains(ax1, 'Scatter')\n",
+ "check_num_collections(ax1, 1) # one scatter collection\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Q2) Seaborn Boxplot\n",
+ "Using **Seaborn**, create a **boxplot** of `tip` by `day` (x=`day`, y=`tip`).\n",
+ "- Store the Axes in a variable named **`ax2`**\n",
+ "- x label must be `Day`, y label must be `Tip ($)`\n",
+ "- Title should contain the word **\"Box\"**\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# TODO: create ax2 using seaborn.boxplot\n",
+ "ax2 = ... # TODO\n",
+ "...\n",
+ "\n",
+ "# π Test\n",
+ "check_axes_instance(ax2)\n",
+ "check_xlabel(ax2, 'Day')\n",
+ "check_ylabel(ax2, 'Tip ($)')\n",
+ "check_title_contains(ax2, 'Box')\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Q3) Matplotlib Histogram\n",
+ "Create a **histogram** of `total_bill` with **10 bins** using Matplotlib.\n",
+ "- Save fig as **`fig3`**, axes as **`ax3`**\n",
+ "- Title should contain **\"Histogram\"**\n",
+ "- x label `Total Bill ($)`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# TODO: histogram with 10 bins\n",
+ "fig3, ax3 = ... # TODO\n",
+ "...\n",
+ "\n",
+ "# π Test\n",
+ "check_axes_instance(ax3)\n",
+ "check_title_contains(ax3, 'Histogram')\n",
+ "check_xlabel(ax3, 'Total Bill ($)')\n",
+ "check_num_patches(ax3, 10)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Q4) Seaborn Aggregated Barplot\n",
+ "Add a computed column `tip_pct = tip / total_bill * 100`. Then plot the **mean tip % by day** using Seaborn (barplot).\n",
+ "- Store the Axes in **`ax4`**\n",
+ "- There should be one bar per unique day in `df['day']`\n",
+ "- y label should contain the `%` sign\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# TODO: add tip_pct column and make barplot of mean tip_pct by day\n",
+ "...\n",
+ "ax4 = ... # TODO\n",
+ "...\n",
+ "\n",
+ "# π Test\n",
+ "check_axes_instance(ax4)\n",
+ "unique_days = sorted(df['day'].unique().tolist())\n",
+ "check_len(ax4.patches, len(unique_days))\n",
+ "check_ylabel(ax4, '%') # contains percent sign\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Q5) Save Figure to File\n",
+ "Save the Q1 scatter figure to `outputs/fig_scatter.png` using `fig1.savefig(...)`.\n",
+ "- The path must be exactly `outputs/fig_scatter.png`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# TODO: save fig1 to outputs/fig_scatter.png\n",
+ "...\n",
+ "\n",
+ "# π Test\n",
+ "check_file_exists('outputs/fig_scatter.png')\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### β
Submit\n",
+ "- All tests above passed\n",
+ "- Save notebook and commit to your repo\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python",
+ "version": "3.11",
+ "mimetype": "text/x-python",
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "pygments_lexer": "ipython3",
+ "nbconvert_exporter": "python",
+ "file_extension": ".py"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
From c2e100ecfd37b137a8206a21b72a6d8efbaac3b9 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Sun, 24 Aug 2025 21:54:40 -0400
Subject: [PATCH 23/26] Update grader.py
---
checkpoints/utils/grader.py | 53 +++++++++++++++++++++++++++++++++++--
1 file changed, 51 insertions(+), 2 deletions(-)
diff --git a/checkpoints/utils/grader.py b/checkpoints/utils/grader.py
index 6237e45..c431013 100644
--- a/checkpoints/utils/grader.py
+++ b/checkpoints/utils/grader.py
@@ -1,10 +1,13 @@
+import os
import numpy as np
import pandas as pd
+import matplotlib
+import matplotlib.pyplot as plt
def _fail(msg):
raise AssertionError(msg)
-# Numpy
+# generic/NumPy/pandas
def check_array(arr, shape=None, dtype=None, allow_int_any=False):
if not isinstance(arr, np.ndarray):
_fail(f"β Expected numpy.ndarray, got {type(arr)}")
@@ -26,7 +29,6 @@ def check_value(val, expected, tol=1e-8):
_fail(f"β Wrong value: expected {expected}, got {val}")
print("β
Value check passed.")
-# pandas
def check_dataframe_columns(df, expected_cols):
if not isinstance(df, pd.DataFrame):
_fail(f"β Expected pandas.DataFrame, got {type(df)}")
@@ -50,3 +52,50 @@ def check_len(obj, expected_len):
if n != expected_len:
_fail(f"β Wrong length: expected {expected_len}, got {n}")
print("β
Length check passed.")
+
+def check_file_exists(path):
+ if not os.path.exists(path):
+ _fail(f"β File not found: {path}")
+ print("β
File exists.")
+
+# Matplotlib/Seaborn helpers for checkpoint 03
+def check_axes_instance(ax):
+ if not hasattr(ax, "get_xlabel") or not hasattr(ax, "get_ylabel"):
+ _fail(f"β Expected a Matplotlib Axes-like object, got {type(ax)}")
+ print("β
Axes instance check passed.")
+
+def check_xlabel(ax, expected):
+ label = ax.get_xlabel()
+ if label != expected and expected not in label:
+ _fail(f"β X label mismatch. Got '{label}', expected '{expected}' (or containing it).")
+ print("β
X label ok.")
+
+def check_ylabel(ax, expected):
+ label = ax.get_ylabel()
+ if label != expected and expected not in label:
+ _fail(f"β Y label mismatch. Got '{label}', expected '{expected}' (or containing it).")
+ print("β
Y label ok.")
+
+def check_title_contains(ax, keyword):
+ title = ax.get_title()
+ if keyword not in title:
+ _fail(f"β Title does not contain '{keyword}'. Got '{title}'")
+ print("β
Title contains keyword.")
+
+def check_num_lines(ax, expected_n):
+ n = len(ax.lines)
+ if n != expected_n:
+ _fail(f"β Expected {expected_n} line(s), got {n}")
+ print("β
Number of lines ok.")
+
+def check_num_collections(ax, expected_n):
+ n = len(ax.collections)
+ if n != expected_n:
+ _fail(f"β Expected {expected_n} collection(s), got {n}")
+ print("β
Number of collections ok.")
+
+def check_num_patches(ax, expected_n):
+ n = len(ax.patches)
+ if n != expected_n:
+ _fail(f"β Expected {expected_n} patch(es), got {n}")
+ print("β
Number of patches ok.")
From 45523f81b02952f9efab8b9dec4df2674a893770 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Sun, 24 Aug 2025 23:09:48 -0400
Subject: [PATCH 24/26] Create 04_plotly_intro.ipynb
---
checkpoints/04_plotly_intro.ipynb | 237 ++++++++++++++++++++++++++++++
1 file changed, 237 insertions(+)
create mode 100644 checkpoints/04_plotly_intro.ipynb
diff --git a/checkpoints/04_plotly_intro.ipynb b/checkpoints/04_plotly_intro.ipynb
new file mode 100644
index 0000000..6912439
--- /dev/null
+++ b/checkpoints/04_plotly_intro.ipynb
@@ -0,0 +1,237 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# β
Checkpoint 04 β Plotly Intro\n",
+ "\n",
+ "**Goal**\n",
+ "- Build interactive charts with Plotly (scatter, histogram, bar) using both Express and Graph Objects.\n",
+ "- Set titles/axis labels, count traces, and export figures to HTML.\n",
+ "\n",
+ "**Rules**\n",
+ "- Fill only where marked as `# TODO`.\n",
+ "- Do not change test cells (π).\n",
+ "- Run all cells in order before submitting.\n",
+ "\n",
+ "**References**\n",
+ "- Plotly docs: https://plotly.com/python/\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# π§ Setup\n",
+ "import os\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import plotly.express as px\n",
+ "import plotly.graph_objects as go\n",
+ "from utils.grader import (\n",
+ " check_file_exists,\n",
+ " check_figure, check_trace_count,\n",
+ " check_axis_title, check_layout_title_contains,\n",
+ " check_bar_count, check_trace_modes\n",
+ ")\n",
+ "np.random.seed(42)\n",
+ "os.makedirs('outputs', exist_ok=True)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Small deterministic dataset (similar to 'tips')\n",
+ "n = 120\n",
+ "days = np.random.choice(['Thur','Fri','Sat','Sun'], size=n, p=[0.25,0.2,0.3,0.25])\n",
+ "sex = np.random.choice(['Male','Female'], size=n)\n",
+ "smoker = np.random.choice(['Yes','No'], size=n, p=[0.3,0.7])\n",
+ "total_bill = np.round(np.random.normal(loc=24, scale=8, size=n).clip(5, 80), 2)\n",
+ "tip = np.round((total_bill * np.random.uniform(0.08, 0.22, size=n)), 2)\n",
+ "df = pd.DataFrame({\n",
+ " 'day': days,\n",
+ " 'sex': sex,\n",
+ " 'smoker': smoker,\n",
+ " 'total_bill': total_bill,\n",
+ " 'tip': tip\n",
+ "})\n",
+ "df.head()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Q1) Plotly Express β Scatter\n",
+ "Create a scatter plot of `total_bill` (x) vs `tip` (y) using **Plotly Express**.\n",
+ "- Color by `day` (optional but encouraged).\n",
+ "- Title should contain **\"Scatter\"**.\n",
+ "- x-axis title: `Total Bill ($)`; y-axis title: `Tip ($)`.\n",
+ "- Store the figure in **`fig1`**.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# TODO: create fig1 with px.scatter\n",
+ "fig1 = ... # TODO\n",
+ "# Example (for reference):\n",
+ "# fig1 = px.scatter(df, x='total_bill', y='tip', color='day', title='Scatter: Tip vs Total Bill')\n",
+ "# fig1.update_layout(xaxis_title='Total Bill ($)', yaxis_title='Tip ($)')\n",
+ "\n",
+ "# π Test\n",
+ "check_figure(fig1)\n",
+ "check_trace_count(fig1, expected_min=1) # at least 1 trace (color may create >1)\n",
+ "check_layout_title_contains(fig1, 'Scatter')\n",
+ "check_axis_title(fig1, axis='x', expected='Total Bill ($)')\n",
+ "check_axis_title(fig1, axis='y', expected='Tip ($)')\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Q2) Plotly Express β Histogram\n",
+ "Create a histogram of `total_bill` with **10 bins**.\n",
+ "- Title should contain **\"Histogram\"**.\n",
+ "- x-axis title: `Total Bill ($)`.\n",
+ "- Store the figure in **`fig2`**.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# TODO: create fig2 with px.histogram and nbins=10\n",
+ "fig2 = ... # TODO\n",
+ "# Example:\n",
+ "# fig2 = px.histogram(df, x='total_bill', nbins=10, title='Histogram: Total Bill')\n",
+ "# fig2.update_layout(xaxis_title='Total Bill ($)')\n",
+ "\n",
+ "# π Test\n",
+ "check_figure(fig2)\n",
+ "check_trace_count(fig2, expected_min=1)\n",
+ "check_layout_title_contains(fig2, 'Histogram')\n",
+ "check_axis_title(fig2, axis='x', expected='Total Bill ($)')\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Q3) Plotly Express β Bar (mean tip%)\n",
+ "Add a computed column `tip_pct = tip / total_bill * 100`. Then plot the **mean tip % by day** as a bar chart.\n",
+ "- One bar per unique `day`.\n",
+ "- y-axis title should contain `%`.\n",
+ "- Store the figure in **`fig3`**.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# TODO: compute tip_pct and create fig3\n",
+ "...\n",
+ "fig3 = ... # TODO\n",
+ "\n",
+ "# π Test\n",
+ "check_figure(fig3)\n",
+ "unique_days = sorted(df['day'].unique().tolist())\n",
+ "check_bar_count(fig3, expected=len(unique_days))\n",
+ "check_axis_title(fig3, axis='y', expected='%') # contains percent sign\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Q4) Graph Objects β Line (running mean of tip)\n",
+ "Using **plotly.graph_objects**, build a line chart of the running mean of `tip` over row index.\n",
+ "- Use `go.Figure` with a single `go.Scatter` trace in `'lines'` mode.\n",
+ "- Title should contain **\"Running Mean\"**.\n",
+ "- Store the figure in **`fig4`**.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# TODO: create running mean and fig4 with go.Figure\n",
+ "...\n",
+ "fig4 = ... # TODO\n",
+ "\n",
+ "# π Test\n",
+ "check_figure(fig4)\n",
+ "check_trace_count(fig4, expected_min=1, expected_max=1)\n",
+ "check_trace_modes(fig4, must_include='lines')\n",
+ "check_layout_title_contains(fig4, 'Running Mean')\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Q5) Export to HTML\n",
+ "Save the Q1 scatter figure to **`outputs/fig_scatter.html`** using `fig1.write_html(...)`.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# TODO: export fig1 to outputs/fig_scatter.html\n",
+ "...\n",
+ "\n",
+ "# π Test\n",
+ "check_file_exists('outputs/fig_scatter.html')\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### β
Submit\n",
+ "- All tests above passed\n",
+ "- Save notebook and commit to your repo\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "name": "python",
+ "version": "3.11",
+ "mimetype": "text/x-python",
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "pygments_lexer": "ipython3",
+ "nbconvert_exporter": "python",
+ "file_extension": ".py"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
From 0028375e3ca254efb1559f9ff741b5174b92ec87 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Mon, 25 Aug 2025 02:21:55 -0400
Subject: [PATCH 25/26] Update grader.py
---
checkpoints/utils/grader.py | 71 +++++++++++++++++++++++++++++++++++--
1 file changed, 69 insertions(+), 2 deletions(-)
diff --git a/checkpoints/utils/grader.py b/checkpoints/utils/grader.py
index c431013..4ac8796 100644
--- a/checkpoints/utils/grader.py
+++ b/checkpoints/utils/grader.py
@@ -1,3 +1,4 @@
+# utils/grader.py
import os
import numpy as np
import pandas as pd
@@ -7,7 +8,7 @@
def _fail(msg):
raise AssertionError(msg)
-# generic/NumPy/pandas
+# Generic / NumPy / pandas
def check_array(arr, shape=None, dtype=None, allow_int_any=False):
if not isinstance(arr, np.ndarray):
_fail(f"β Expected numpy.ndarray, got {type(arr)}")
@@ -58,7 +59,7 @@ def check_file_exists(path):
_fail(f"β File not found: {path}")
print("β
File exists.")
-# Matplotlib/Seaborn helpers for checkpoint 03
+# Matplotlib / Seaborn helpers
def check_axes_instance(ax):
if not hasattr(ax, "get_xlabel") or not hasattr(ax, "get_ylabel"):
_fail(f"β Expected a Matplotlib Axes-like object, got {type(ax)}")
@@ -99,3 +100,69 @@ def check_num_patches(ax, expected_n):
if n != expected_n:
_fail(f"β Expected {expected_n} patch(es), got {n}")
print("β
Number of patches ok.")
+
+# Plotly helpers
+def check_figure(fig):
+ try:
+ import plotly.graph_objects as go
+ except Exception as e:
+ _fail(f"β Plotly not installed: {e}")
+ if not isinstance(fig, go.Figure):
+ _fail(f"β Expected plotly.graph_objects.Figure, got {type(fig)}")
+ print("β
Figure instance ok.")
+
+def check_trace_count(fig, expected_min=None, expected_max=None):
+ n = len(fig.data)
+ if expected_min is not None and n < expected_min:
+ _fail(f"β Too few traces: got {n}, expected >= {expected_min}")
+ if expected_max is not None and n > expected_max:
+ _fail(f"β Too many traces: got {n}, expected <= {expected_max}")
+ print("β
Trace count ok.")
+
+def _get_axis(fig, axis):
+ if axis == 'x':
+ return fig.layout.xaxis
+ elif axis == 'y':
+ return fig.layout.yaxis
+ else:
+ _fail("β axis must be 'x' or 'y'")
+
+def check_axis_title(fig, axis='x', expected=None):
+ ax = _get_axis(fig, axis)
+ title = getattr(ax.title, "text", "") if ax.title else ""
+ if expected is None:
+ _fail("β expected title text is None")
+ if expected != title and (expected not in title):
+ _fail(f"β {axis}-axis title mismatch. Got '{title}', expected '{expected}' (or containing it).")
+ print(f"β
{axis.upper()} axis title ok.")
+
+def check_layout_title_contains(fig, keyword):
+ title = getattr(fig.layout.title, "text", "") if fig.layout.title else ""
+ if keyword not in title:
+ _fail(f"β Layout title does not contain '{keyword}'. Got '{title}'")
+ print("β
Layout title contains keyword.")
+
+def check_bar_count(fig, expected):
+ if len(fig.data) == 0:
+ _fail("β No traces in figure.")
+ trace = fig.data[0]
+ xs = getattr(trace, "x", None)
+ if xs is None:
+ _fail("β Bar trace has no x values.")
+ n = len(xs)
+ if n != expected:
+ _fail(f"β Expected {expected} bars, got {n}")
+ print("β
Bar count ok.")
+
+def check_trace_modes(fig, must_include='lines'):
+ if len(fig.data) == 0:
+ _fail("β No traces in figure.")
+ modes = []
+ for t in fig.data:
+ mode = getattr(t, "mode", None)
+ if mode:
+ modes.append(mode)
+ joined = ",".join(modes)
+ if must_include not in joined:
+ _fail(f"β Required mode '{must_include}' not found in traces. Got modes: {modes}")
+ print("β
Trace mode ok.")
From 8cdde1248efae90de785a6e761db5ba6375267f4 Mon Sep 17 00:00:00 2001
From: Aiden <113921954+cereal-with-water@users.noreply.github.com>
Date: Thu, 4 Sep 2025 16:46:47 -0400
Subject: [PATCH 26/26] Update README.md
---
README.md | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 73 insertions(+), 3 deletions(-)
diff --git a/README.md b/README.md
index cff496d..8aaeca6 100644
--- a/README.md
+++ b/README.md
@@ -213,9 +213,79 @@ stacked = np.vstack([c, d]) # vertical stack of two 2Γ3 arrays
- π₯Basic Machine Learning with scikit-learnπ₯
- Build your first regression and classification models, split data, and evaluate performance.
-
+ π₯Basic Machine Learning with scikit-learnπ₯
+Build your first regression and classification models, split data, and evaluate performance.
+
+## π Library Overview
+scikit-learn is one of the most widely used ML libraries in Python.
+It provides simple APIs for preprocessing, training models, and evaluating performance.
+
+### β¨ Key Features
+- Large collection of supervised & unsupervised algorithms
+- Easy dataset splitting, scaling, and pipelines
+- Built-in metrics for evaluation
+- Works seamlessly with NumPy & pandas
+
+---
+
+### 1. What
+> **What you will learn in this section.**
+> By the end of this notebook, you will be able to:
+> - Split data into train/test sets
+> - Train a simple regression model
+> - Train a classification model
+> - Evaluate predictions using accuracy and error metrics
+
+---
+
+### 2. Why
+> **Why this topic matters.**
+> - Machine Learning is the core of many data science projects.
+> - scikit-learn offers a consistent interface to try many models quickly.
+> - Understanding the ML workflow (split β train β predict β evaluate) is essential.
+
+---
+### 3. How
+> **How to do it.**
+> Follow these hands-on examples:
+
+```python
+from sklearn.datasets import load_iris, make_regression
+from sklearn.model_selection import train_test_split
+from sklearn.linear_model import LinearRegression, LogisticRegression
+from sklearn.metrics import mean_squared_error, accuracy_score
+import numpy as np
+
+# --- Regression Example ---
+# Generate synthetic data
+X_reg, y_reg = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)
+
+# Train/test split
+X_train, X_test, y_train, y_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)
+
+# Fit linear regression
+reg = LinearRegression()
+reg.fit(X_train, y_train)
+
+# Predict and evaluate
+y_pred = reg.predict(X_test)
+print("MSE (Regression):", mean_squared_error(y_test, y_pred))
+
+
+# --- Classification Example ---
+iris = load_iris()
+X_clf, y_clf = iris.data, iris.target
+
+X_train, X_test, y_train, y_test = train_test_split(X_clf, y_clf, test_size=0.2, random_state=42)
+
+clf = LogisticRegression(max_iter=200)
+clf.fit(X_train, y_train)
+
+y_pred = clf.predict(X_test)
+print("Accuracy (Classification):", accuracy_score(y_test, y_pred))
+
+```
+