From 3bcd5ff3c5d58cc6c5bf27b4ccb986d413a672c2 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Thu, 24 Jul 2025 20:02:48 -0400 Subject: [PATCH 01/26] Update README.md --- README.md | 48 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index c3138ed..421041a 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,48 @@ # Python-Data-Science-Onboarding -Coming Soon + +# Onboarding Tutorial + +Welcome to the WMU DSC/Developer Club! + +This repository is designed to help new members get familiar with the tools and workflows commonly used in our data science projects. + +--- + +## πŸš€ Who is this for? + +This tutorial assumes you already have *basic Python knowledge*, including: + +- Using numpy and pandas for data handling +- Knowing what a .ipynb Jupyter Notebook file is +- Using scikit-learn to build simple machine learning models + +> *Don't know Python yet?* No problem! +> Start with the resources below before continuing: +> +> - [W3Schools Python Tutorial](https://www.w3schools.com/python/) +> - [Google's Python Class](https://developers.google.com/edu/python) +> - [Python for Beginners (YouTube)](https://www.youtube.com/watch?v=K5KVEU3aaeQ&t=56s) + +
+ Python Installation Guide (For Beginners) + +To follow along with the notebooks in this repository, you need Python installed on your machine. + +### πŸŽ₯ How to Install Python + + [For macOS](https://www.youtube.com/watch?v=nhv82tvFfkM) + [For Windows](https://www.youtube.com/watch?v=YagM_FuPLQU) + +> πŸ“Œ *Important*: During installation, make sure to check: +> *β€œAdd Python to PATH”* + +### Verify Your Installation + +After installing, open a terminal (or Command Prompt on Windows), and run: + +```bash +python --version +pip --version +``` + +
From ae5999ce70db1b781bbc11b658aa23235b5a0211 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Thu, 24 Jul 2025 20:12:00 -0400 Subject: [PATCH 02/26] Update README.md --- README.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/README.md b/README.md index 421041a..24fc5b7 100644 --- a/README.md +++ b/README.md @@ -46,3 +46,18 @@ pip --version ``` + +--- + +## πŸ“˜ Core Topics + +- πŸ”Έ [Data Handling with NumPy & Pandas]([notebooks/tutorial0/tutorial0.ipynb](https://github.com/cereal-with-water/Numpy-Pandas-tutorial)) + Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames. + +- πŸ”Έ [Understanding Jupyter Notebooks (`.ipynb`)]([docs/ipynb_guide.md](https://github.com/cereal-with-water/Jupyter-Notebooks-tutorial)) + What are text vs code cells, how to run them, and best practices for documenting your analysis. + +- πŸ”Έ [Basic Machine Learning with scikit-learn]([notebooks/tutorial1/tutorial1.ipynb](https://github.com/cereal-with-water/ML-tools-tutorial)) + Build your first regression and classification models, split data, and evaluate performance. + +--- From 178a67e500975434902e4673a368996c8c4c5a0a Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Thu, 24 Jul 2025 20:14:02 -0400 Subject: [PATCH 03/26] Update README.md --- README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 24fc5b7..6e449c0 100644 --- a/README.md +++ b/README.md @@ -51,13 +51,14 @@ pip --version ## πŸ“˜ Core Topics -- πŸ”Έ [Data Handling with NumPy & Pandas]([notebooks/tutorial0/tutorial0.ipynb](https://github.com/cereal-with-water/Numpy-Pandas-tutorial)) +- πŸ”Έ [Data Handling with NumPy & Pandas](https://github.com/cereal-with-water/Numpy-Pandas-tutorial) Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames. -- πŸ”Έ [Understanding Jupyter Notebooks (`.ipynb`)]([docs/ipynb_guide.md](https://github.com/cereal-with-water/Jupyter-Notebooks-tutorial)) +- πŸ”Έ [Understanding Jupyter Notebooks (`.ipynb`)](https://github.com/cereal-with-water/Jupyter-Notebooks-tutorial) What are text vs code cells, how to run them, and best practices for documenting your analysis. -- πŸ”Έ [Basic Machine Learning with scikit-learn]([notebooks/tutorial1/tutorial1.ipynb](https://github.com/cereal-with-water/ML-tools-tutorial)) +- πŸ”Έ [Basic Machine Learning with scikit-learn](https://github.com/cereal-with-water/ML-tools-tutorial) Build your first regression and classification models, split data, and evaluate performance. --- + From 5c2a155f1778a1602698c078d573b19c53f28c46 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Thu, 24 Jul 2025 20:27:17 -0400 Subject: [PATCH 04/26] Update README.md --- README.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/README.md b/README.md index 6e449c0..d22c1db 100644 --- a/README.md +++ b/README.md @@ -49,6 +49,32 @@ pip --version --- +## πŸ“¦ Recommended Libraries + +In Python, you install packages by running: +```bash +pip install +``` + +Before you dive into the notebooks, make sure you have the core data-science libraries installed. You can install them all at once via pip: + +```bash +pip install \ + numpy \ + pandas \ + matplotlib \ + seaborn \ + scikit-learn \ + notebook +``` + +Or, if you prefer a single command: +``` +pip install numpy pandas matplotlib seaborn scikit-learn notebook +``` + +--- + ## πŸ“˜ Core Topics - πŸ”Έ [Data Handling with NumPy & Pandas](https://github.com/cereal-with-water/Numpy-Pandas-tutorial) From c41ad9bc4832c33a4eacf419322989b99460f4e7 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Tue, 12 Aug 2025 02:38:38 -0400 Subject: [PATCH 05/26] Update README.md --- README.md | 148 +++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 140 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index d22c1db..696687b 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,5 @@ # Python-Data-Science-Onboarding -# Onboarding Tutorial - Welcome to the WMU DSC/Developer Club! This repository is designed to help new members get familiar with the tools and workflows commonly used in our data science projects. @@ -24,7 +22,7 @@ This tutorial assumes you already have *basic Python knowledge*, including: > - [Python for Beginners (YouTube)](https://www.youtube.com/watch?v=K5KVEU3aaeQ&t=56s)
- Python Installation Guide (For Beginners) + πŸ’‘Python Installation Guide For Beginners To follow along with the notebooks in this repository, you need Python installed on your machine. @@ -77,14 +75,148 @@ pip install numpy pandas matplotlib seaborn scikit-learn notebook ## πŸ“˜ Core Topics -- πŸ”Έ [Data Handling with NumPy & Pandas](https://github.com/cereal-with-water/Numpy-Pandas-tutorial) - Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames. +
+ πŸ”₯Data Handling with NumPy & PandasπŸ”₯ + Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames. + # Numpy & Pandas + +## πŸ” Library Overview + +Before we dive in, here's a quick intro to the two core libraries we’ll use: + +### NumPy +- **The fundamental package for numerical computing in Python.** +- **Key features:** + - **Arrays:** Homogeneous, N-dimensional arrays (faster and more memory-efficient than Python lists) + - **Vectorized ops:** Element-wise arithmetic without explicit loops + - **Linear algebra & random:** Built-in support for matrix operations and pseudo-random number generation + +### Pandas +- **A powerful data analysis and manipulation library built on top of NumPy.** +- **Key features:** + - **DataFrame:** 2D tabular data structure with labeled axes (rows & columns) + - **IO tools:** Read/write CSV, Excel, SQL, JSON, and more + - **Series:** 1D labeled array, great for time series and single-column tables + - **Grouping & aggregation:** Split-apply-combine workflows for summarizing data + + + +### 1. What +> **What you will learn in this section.** +> By the end of this notebook, you will be able to: +> - Create and manipulate NumPy arrays of different shapes and dtypes +> - Perform element-wise arithmetic and universal functions +> - Index, slice, and reshape arrays for efficient computation + +--- + +### 2. Why +> **Why this topic matters.** +> NumPy arrays are the foundation of nearly all scientific computing in Python. +> They provide: +> - **Speed:** Vectorized operations run much faster than Python loops +> - **Memory efficiency:** Compact storage of homogeneous data +> - **Interoperability:** A common data structure for libraries like Pandas, SciPy, and scikit-learn + +--- -- πŸ”Έ [Understanding Jupyter Notebooks (`.ipynb`)](https://github.com/cereal-with-water/Jupyter-Notebooks-tutorial) +### 3. How +> **How to do it.** +> Follow these step-by-step examples: + +```python +import numpy as np + +# 1) Create arrays +a = np.array([1, 2, 3, 4]) +b = np.arange(0, 10, 2) # [0, 2, 4, 6, 8] +c = np.zeros((2, 3), dtype=int) # 2Γ—3 array of zeros + +# 2) Element-wise arithmetic +sum_ab = a + b[:4] # adds element by element +prod_ab = a * b[:4] # multiplies element by element + +# 3) Universal functions +sqrt_b = np.sqrt(b) # square root of each element +exp_a = np.exp(a) # eᡃ for each element + +# 4) Indexing & slicing +row = b[2:5] # slice subarray +c[0, :] = row # assign a row + +# 5) Reshape & combine +d = np.linspace(0, 1, 6).reshape(2, 3) +stacked = np.vstack([c, d]) # vertical stack of two 2Γ—3 arrays + + +``` + + + +
+ +
+ πŸ”₯Understanding Jupyter Notebooks (.ipynb)πŸ”₯ What are text vs code cells, how to run them, and best practices for documenting your analysis. + # πŸ“ Jupyter Notebook Quickstart Guide -- πŸ”Έ [Basic Machine Learning with scikit-learn](https://github.com/cereal-with-water/ML-tools-tutorial) - Build your first regression and classification models, split data, and evaluate performance. +This guide will introduce you to Jupyter Notebookβ€”from β€œwhat it is” to how to install and use it locally or in the cloudβ€”then walk you through basic operations, hands-on examples, Markdown usage, and sharing. + +--- + +## πŸ” What Is Jupyter Notebook? + +Jupyter Notebook is an interactive computing environment where you can combine live code, equations, visualizations, and narrative text in a single document (`.ipynb`). It’s widely used for data analysis, teaching, and rapid prototyping. + +- **Key Features** + - Interactive code execution + - Rich text via Markdown (headings, lists, LaTeX) + - Inline data visualizations + - Easy sharing and reproducibility + +--- + +## βš™οΈ Installation & Access + +### 1. Install Locally + +You’ll need Python installed first. Then: + +```bash +# Install Jupyter Notebook via pip +pip install notebook +``` +Or, if you use Conda: +```bash +conda install -c conda-forge notebook +``` +After installation, launch the notebook server: +```bash +jupyter notebook +``` +Your default browser will open at http://localhost:8888, showing the notebook dashboard. + +### 2. Use JupyterLab (Optional) +For a more full-featured interface: + +```bash +pip install jupyterlab +jupyter lab +``` +### 3. Cloud / Web Options +Google Colab + +1. Go to colab.research.google.com +2. Sign in with your Google account +3. Open or upload any .ipynb file + + + +
+ +
+ πŸ”₯Basic Machine Learning with scikit-learnπŸ”₯ + Build your first regression and classification models, split data, and evaluate performance.
--- From 53136abf5efff5c92aa4b50346e7dd5141976f4c Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Tue, 12 Aug 2025 03:58:04 -0400 Subject: [PATCH 06/26] Update README.md --- README.md | 61 +++++++++++++++++++++++++++---------------------------- 1 file changed, 30 insertions(+), 31 deletions(-) diff --git a/README.md b/README.md index 696687b..addcdc6 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,11 @@ # Python-Data-Science-Onboarding -Welcome to the WMU DSC/Developer Club! - +Welcome to the WMU DSC/Developer Club!
This repository is designed to help new members get familiar with the tools and workflows commonly used in our data science projects. +
+
+ ---- ## πŸš€ Who is this for? @@ -14,23 +15,25 @@ This tutorial assumes you already have *basic Python knowledge*, including: - Knowing what a .ipynb Jupyter Notebook file is - Using scikit-learn to build simple machine learning models -> *Don't know Python yet?* No problem! -> Start with the resources below before continuing: -> -> - [W3Schools Python Tutorial](https://www.w3schools.com/python/) -> - [Google's Python Class](https://developers.google.com/edu/python) -> - [Python for Beginners (YouTube)](https://www.youtube.com/watch?v=K5KVEU3aaeQ&t=56s) -
- πŸ’‘Python Installation Guide For Beginners +❓Don't know Python yet? No problem!❓ +
-To follow along with the notebooks in this repository, you need Python installed on your machine. +> **Start with the resources below before continuing:**
+>   [W3Schools Python Tutorial](https://www.w3schools.com/python/) +>   [Google's Python Class](https://developers.google.com/edu/python) +>   [Python for Beginners (YouTube)](https://www.youtube.com/watch?v=K5KVEU3aaeQ&t=56s) +
-### πŸŽ₯ How to Install Python - - [For macOS](https://www.youtube.com/watch?v=nhv82tvFfkM) - [For Windows](https://www.youtube.com/watch?v=YagM_FuPLQU) +
+ ❓Python Installation Guide For Beginners❓ +
+ +> ### To follow along with the notebooks in this repository, you need Python installed on your machine. +> ### πŸŽ₯ How to Install Python +>    [For macOS](https://www.youtube.com/watch?v=nhv82tvFfkM) +>    [For Windows](https://www.youtube.com/watch?v=YagM_FuPLQU)

> πŸ“Œ *Important*: During installation, make sure to check: > *β€œAdd Python to PATH”* @@ -42,10 +45,11 @@ After installing, open a terminal (or Command Prompt on Windows), and run: python --version pip --version ``` -
+
+
+ ---- ## πŸ“¦ Recommended Libraries @@ -65,20 +69,16 @@ pip install \ scikit-learn \ notebook ``` +
+
-Or, if you prefer a single command: -``` -pip install numpy pandas matplotlib seaborn scikit-learn notebook -``` ---- ## πŸ“˜ Core Topics
πŸ”₯Data Handling with NumPy & PandasπŸ”₯ Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames. - # Numpy & Pandas ## πŸ” Library Overview @@ -150,15 +150,14 @@ stacked = np.vstack([c, d]) # vertical stack of two 2Γ—3 arrays ``` +
- -
πŸ”₯Understanding Jupyter Notebooks (.ipynb)πŸ”₯ - What are text vs code cells, how to run them, and best practices for documenting your analysis. - # πŸ“ Jupyter Notebook Quickstart Guide +What are text vs code cells, how to run them, and best practices for documenting your analysis. +# πŸ“ Jupyter Notebook Quickstart Guide This guide will introduce you to Jupyter Notebookβ€”from β€œwhat it is” to how to install and use it locally or in the cloudβ€”then walk you through basic operations, hands-on examples, Markdown usage, and sharing. @@ -209,14 +208,14 @@ Google Colab 1. Go to colab.research.google.com 2. Sign in with your Google account 3. Open or upload any .ipynb file +
- -
πŸ”₯Basic Machine Learning with scikit-learnπŸ”₯ - Build your first regression and classification models, split data, and evaluate performance.
+ Build your first regression and classification models, split data, and evaluate performance. + --- From 395646502f2eca498eb5c6177a2c30d7920b17f2 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Tue, 12 Aug 2025 03:58:27 -0400 Subject: [PATCH 07/26] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index addcdc6..f68d1d2 100644 --- a/README.md +++ b/README.md @@ -217,5 +217,5 @@ Google Colab Build your first regression and classification models, split data, and evaluate performance. ---- + From 7db8abb6b4d97312eb5dc64a82a3883cb0173d11 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Tue, 12 Aug 2025 22:52:27 -0400 Subject: [PATCH 08/26] Libraries --- libraries | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 libraries diff --git a/libraries b/libraries new file mode 100644 index 0000000..816ac87 --- /dev/null +++ b/libraries @@ -0,0 +1,45 @@ +# πŸ“š Top 26 Python Libraries for Data Science + +## Staple Python Libraries for Data Science +1. **NumPy** – Core numerical computing library in Python, offering fast operations on multi-dimensional arrays and matrices, essential for scientific computing and linear algebra. +2. **pandas** – Powerful data analysis/manipulation tool providing DataFrame structures, easy I/O with multiple file formats, and advanced indexing, grouping, and time series functionality. +3. **Matplotlib** – Fundamental plotting library for creating static, interactive, and animated visualizations with full customization. +4. **Seaborn** – High-level statistical visualization library built on Matplotlib, offering attractive and informative default styles for complex plots. +5. **Plotly** – Interactive graphing library for web-based visualizations, supporting 3D charts and dashboards via Dash. +6. **scikit-learn** – Comprehensive machine learning library for classification, regression, clustering, and preprocessing, with a consistent API. + +--- + +## Machine Learning Python Libraries +7. **LightGBM** – Gradient boosting framework optimized for speed, memory efficiency, and accuracy, supporting large-scale and GPU-based learning. +8. **XGBoost** – Widely used gradient boosting library known for performance in Kaggle competitions, supporting distributed training and multiple platforms. +9. **CatBoost** – High-performance gradient boosting library with strong categorical feature handling and excellent CPU/GPU support. +10. **Statsmodels** – Statistical modeling library for regression, hypothesis testing, and time series analysis, with an R-like interface. +11. **RAPIDS cuDF/cuML** – NVIDIA GPU-accelerated libraries for DataFrame manipulation (cuDF) and machine learning (cuML) with pandas- and scikit-learn-like APIs. +12. **Optuna** – Hyperparameter optimization framework with efficient algorithms, pruning, and visualization tools. + +--- + +## Automated Machine Learning (AutoML) Python Libraries +13. **PyCaret** – Low-code machine learning library automating the end-to-end ML workflow for rapid experimentation. +14. **H2O** – Scalable ML platform for big data, supporting distributed computing and AutoML. +15. **TPOT** – AutoML tool using genetic programming to optimize ML pipelines automatically. +16. **auto-sklearn** – Automated model selection and hyperparameter tuning built on scikit-learn with Bayesian optimization. +17. **FLAML** – Lightweight AutoML library focused on finding accurate models quickly with minimal computational cost. + +--- + +## Deep Learning Python Libraries +18. **TensorFlow** – Google’s open-source ML framework for scalable deep learning, offering APIs for building, training, and deploying models. +19. **PyTorch** – Facebook’s deep learning framework known for dynamic computation graphs, ease of use, and strong research-to-production transition. +20. **FastAI** – High-level deep learning library on PyTorch with concise APIs for state-of-the-art results. +21. **Keras** – User-friendly deep learning API integrated with TensorFlow, designed for quick prototyping and experimentation. +22. **PyTorch Lightning** – Lightweight wrapper for PyTorch that organizes code for reproducibility and scalability. + +--- + +## Natural Language Processing Python Libraries +23. **NLTK** – Comprehensive NLP toolkit for tokenization, parsing, and linguistic processing with access to corpora like WordNet. +24. **spaCy** – Industrial-strength NLP library for large-scale text processing, supporting deep learning integration and 60+ languages. +25. **Gensim** – Topic modeling and vector space modeling library optimized for large corpora and memory efficiency. +26. **Hugging Face Transformers** – Library for state-of-the-art transformer-based models for text, vision, audio, and multimodal tasks, supporting PyTorch, TensorFlow, and JAX. From ee1046a9459d6ce4deeffb3a76e51c20e9588605 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Tue, 12 Aug 2025 22:54:08 -0400 Subject: [PATCH 09/26] Libraries --- libraries => libraries.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename libraries => libraries.md (100%) diff --git a/libraries b/libraries.md similarity index 100% rename from libraries rename to libraries.md From 0f42452004e608fc565aa6b238f83e0177786264 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Tue, 12 Aug 2025 22:58:33 -0400 Subject: [PATCH 10/26] Update libraries.md --- libraries.md | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/libraries.md b/libraries.md index 816ac87..6744b24 100644 --- a/libraries.md +++ b/libraries.md @@ -1,5 +1,7 @@ # πŸ“š Top 26 Python Libraries for Data Science + + ## Staple Python Libraries for Data Science 1. **NumPy** – Core numerical computing library in Python, offering fast operations on multi-dimensional arrays and matrices, essential for scientific computing and linear algebra. 2. **pandas** – Powerful data analysis/manipulation tool providing DataFrame structures, easy I/O with multiple file formats, and advanced indexing, grouping, and time series functionality. @@ -8,26 +10,29 @@ 5. **Plotly** – Interactive graphing library for web-based visualizations, supporting 3D charts and dashboards via Dash. 6. **scikit-learn** – Comprehensive machine learning library for classification, regression, clustering, and preprocessing, with a consistent API. ---- + +
## Machine Learning Python Libraries 7. **LightGBM** – Gradient boosting framework optimized for speed, memory efficiency, and accuracy, supporting large-scale and GPU-based learning. 8. **XGBoost** – Widely used gradient boosting library known for performance in Kaggle competitions, supporting distributed training and multiple platforms. 9. **CatBoost** – High-performance gradient boosting library with strong categorical feature handling and excellent CPU/GPU support. 10. **Statsmodels** – Statistical modeling library for regression, hypothesis testing, and time series analysis, with an R-like interface. -11. **RAPIDS cuDF/cuML** – NVIDIA GPU-accelerated libraries for DataFrame manipulation (cuDF) and machine learning (cuML) with pandas- and scikit-learn-like APIs. +11. **RAPIDS cuDF/cuML** – NVIDIA GPU-accelerated libraries for DataFrame manipulation and machine learning with pandas- and scikit-learn-like APIs. 12. **Optuna** – Hyperparameter optimization framework with efficient algorithms, pruning, and visualization tools. ---- +
-## Automated Machine Learning (AutoML) Python Libraries + +## Automated Machine Learning Python Libraries 13. **PyCaret** – Low-code machine learning library automating the end-to-end ML workflow for rapid experimentation. 14. **H2O** – Scalable ML platform for big data, supporting distributed computing and AutoML. 15. **TPOT** – AutoML tool using genetic programming to optimize ML pipelines automatically. 16. **auto-sklearn** – Automated model selection and hyperparameter tuning built on scikit-learn with Bayesian optimization. 17. **FLAML** – Lightweight AutoML library focused on finding accurate models quickly with minimal computational cost. ---- +
+ ## Deep Learning Python Libraries 18. **TensorFlow** – Google’s open-source ML framework for scalable deep learning, offering APIs for building, training, and deploying models. @@ -36,7 +41,8 @@ 21. **Keras** – User-friendly deep learning API integrated with TensorFlow, designed for quick prototyping and experimentation. 22. **PyTorch Lightning** – Lightweight wrapper for PyTorch that organizes code for reproducibility and scalability. ---- + +
## Natural Language Processing Python Libraries 23. **NLTK** – Comprehensive NLP toolkit for tokenization, parsing, and linguistic processing with access to corpora like WordNet. From 9823fdae28a04ab48c87cc5bbd2184005e5d42c7 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Wed, 13 Aug 2025 00:17:56 -0400 Subject: [PATCH 11/26] Update README.md --- README.md | 116 +++++++++++++++++++++++++++--------------------------- 1 file changed, 58 insertions(+), 58 deletions(-) diff --git a/README.md b/README.md index f68d1d2..cff496d 100644 --- a/README.md +++ b/README.md @@ -76,6 +76,64 @@ pip install \ ## πŸ“˜ Core Topics +
+ πŸ”₯Understanding Jupyter Notebooks (.ipynb)πŸ”₯ +What are text vs code cells, how to run them, and best practices for documenting your analysis. +# πŸ“ Jupyter Notebook Quickstart Guide + +This guide will introduce you to Jupyter Notebookβ€”from β€œwhat it is” to how to install and use it locally or in the cloudβ€”then walk you through basic operations, hands-on examples, Markdown usage, and sharing. + +--- + +## πŸ” What Is Jupyter Notebook? + +Jupyter Notebook is an interactive computing environment where you can combine live code, equations, visualizations, and narrative text in a single document (`.ipynb`). It’s widely used for data analysis, teaching, and rapid prototyping. + +- **Key Features** + - Interactive code execution + - Rich text via Markdown (headings, lists, LaTeX) + - Inline data visualizations + - Easy sharing and reproducibility + +--- + +## βš™οΈ Installation & Access + +### 1. Install Locally + +You’ll need Python installed first. Then: + +```bash +# Install Jupyter Notebook via pip +pip install notebook +``` +Or, if you use Conda: +```bash +conda install -c conda-forge notebook +``` +After installation, launch the notebook server: +```bash +jupyter notebook +``` +Your default browser will open at http://localhost:8888, showing the notebook dashboard. + +### 2. Use JupyterLab (Optional) +For a more full-featured interface: + +```bash +pip install jupyterlab +jupyter lab +``` +### 3. Cloud / Web Options +Google Colab + +1. Go to colab.research.google.com +2. Sign in with your Google account +3. Open or upload any .ipynb file +
+ + +
πŸ”₯Data Handling with NumPy & PandasπŸ”₯ Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames. @@ -154,64 +212,6 @@ stacked = np.vstack([c, d]) # vertical stack of two 2Γ—3 arrays -
- πŸ”₯Understanding Jupyter Notebooks (.ipynb)πŸ”₯ -What are text vs code cells, how to run them, and best practices for documenting your analysis. -# πŸ“ Jupyter Notebook Quickstart Guide - -This guide will introduce you to Jupyter Notebookβ€”from β€œwhat it is” to how to install and use it locally or in the cloudβ€”then walk you through basic operations, hands-on examples, Markdown usage, and sharing. - ---- - -## πŸ” What Is Jupyter Notebook? - -Jupyter Notebook is an interactive computing environment where you can combine live code, equations, visualizations, and narrative text in a single document (`.ipynb`). It’s widely used for data analysis, teaching, and rapid prototyping. - -- **Key Features** - - Interactive code execution - - Rich text via Markdown (headings, lists, LaTeX) - - Inline data visualizations - - Easy sharing and reproducibility - ---- - -## βš™οΈ Installation & Access - -### 1. Install Locally - -You’ll need Python installed first. Then: - -```bash -# Install Jupyter Notebook via pip -pip install notebook -``` -Or, if you use Conda: -```bash -conda install -c conda-forge notebook -``` -After installation, launch the notebook server: -```bash -jupyter notebook -``` -Your default browser will open at http://localhost:8888, showing the notebook dashboard. - -### 2. Use JupyterLab (Optional) -For a more full-featured interface: - -```bash -pip install jupyterlab -jupyter lab -``` -### 3. Cloud / Web Options -Google Colab - -1. Go to colab.research.google.com -2. Sign in with your Google account -3. Open or upload any .ipynb file -
- - -
πŸ”₯Basic Machine Learning with scikit-learnπŸ”₯ Build your first regression and classification models, split data, and evaluate performance. From 9a2f831c24713564b93c6ae9eead5bc42fcbed2f Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Mon, 18 Aug 2025 15:32:08 -0400 Subject: [PATCH 12/26] Create man.md --- tutorial/man.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 tutorial/man.md diff --git a/tutorial/man.md b/tutorial/man.md new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/tutorial/man.md @@ -0,0 +1 @@ + From c800a1fa986ba7d585bfc52f1ba70f2a11085583 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Mon, 18 Aug 2025 16:05:43 -0400 Subject: [PATCH 13/26] Delete tutorial/man.md --- tutorial/man.md | 1 - 1 file changed, 1 deletion(-) delete mode 100644 tutorial/man.md diff --git a/tutorial/man.md b/tutorial/man.md deleted file mode 100644 index 8b13789..0000000 --- a/tutorial/man.md +++ /dev/null @@ -1 +0,0 @@ - From ceb0bf1d7aea6ceab7a518e8377439e2fa99d2d2 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Mon, 18 Aug 2025 16:59:36 -0400 Subject: [PATCH 14/26] Create 01_numpy_basics.md --- checkpoints/01_numpy_basics.md | 71 ++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 checkpoints/01_numpy_basics.md diff --git a/checkpoints/01_numpy_basics.md b/checkpoints/01_numpy_basics.md new file mode 100644 index 0000000..594e7bc --- /dev/null +++ b/checkpoints/01_numpy_basics.md @@ -0,0 +1,71 @@ +# βœ… Checkpoint 01 β€” NumPy Basics + +**Goal** +- Create/reshape arrays, vectorized ops, boolean masking + +**Rules** +- Fill only where marked as `# TODO` +- Do not change test cells (πŸ”’) +- Run all cells before submitting + +**References** +- NumPy docs: https://numpy.org/doc/ + +--- + +```python +# πŸ”§ Setup +import numpy as np +import pandas as pd +from utils.grader import check_array, check_value + +np.random.seed(42) +``` + + +# Q1) Create a 3x3 array with values 0..8 (row-major) +# TODO: assign to variable 'A' +A = ... # TODO + +# πŸ”’ Test +check_array(A, shape=(3,3), dtype=np.integer) +check_value(A.sum(), 36) + +# Q2) From A, create a boolean mask selecting even numbers +# TODO: assign to variable 'mask_even' +mask_even = ... # TODO + +# πŸ”’ Test +check_array(mask_even, shape=(3,3), dtype=bool) +check_value(int(mask_even.sum()), 5) # number of evens in 0..8 + +# Q3) Reshape, stack, and compute row-wise means β†’ 'means' +# TODO: assign to variable 'means' (1D array length 3) +B = ... # TODO +means = ... # TODO + +# πŸ”’ Test +check_array(means, shape=(3,)) + + +# Q4) Broadcasting: A (3x3) and v (1x3) β†’ 'C' +v = np.array([10, 0, -10]) +C = ... # TODO + +# πŸ”’ Test +check_array(C, shape=(3,3), dtype=np.integer) +check_value(int(C[0,0] + C[2,2]), (A[0,0]+10) + (A[2,2]-10)) + + +# Q5) Fancy indexing / boolean masking +# Extract odd numbers β‰₯ 3 from A β†’ 'odd_ge3' +odd_ge3 = ... # TODO + +# πŸ”’ Test +check_array( + odd_ge3, + shape=(np.count_nonzero((A>=3)&(A%2==1)),), + dtype=np.integer, + allow_int_any=True +) +check_value(int(odd_ge3.min()), 3) From 8d2330cd09b2a35d9f0199b8ca47041d0c2a0972 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Thu, 21 Aug 2025 16:52:53 -0400 Subject: [PATCH 15/26] Add files via upload --- checkpoints/01_numpy_basics.ipynb | 137 ++++++++++++++++++++++++++++++ 1 file changed, 137 insertions(+) create mode 100644 checkpoints/01_numpy_basics.ipynb diff --git a/checkpoints/01_numpy_basics.ipynb b/checkpoints/01_numpy_basics.ipynb new file mode 100644 index 0000000..d237f78 --- /dev/null +++ b/checkpoints/01_numpy_basics.ipynb @@ -0,0 +1,137 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# βœ… Checkpoint 01 β€” NumPy Basics\n\n", + "**Goal**\n", + "- Create/reshape arrays, vectorized ops, boolean masking\n\n", + "**Rules**\n", + "- Fill only where marked as `# TODO`\n", + "- Do not change test cells (πŸ”’)\n", + "- Run all cells before submitting\n\n", + "**References**\n", + "- NumPy docs: https://numpy.org/doc/\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# πŸ”§ Setup\n", + "import numpy as np\n", + "import pandas as pd\n", + "from utils.grader import check_array, check_value\n\n", + "np.random.seed(42)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Q1) Create a 3x3 array with values 0..8 (row-major)\n", + "# TODO: assign to variable 'A'\n", + "A = ... # TODO\n\n", + "# πŸ”’ Test\n", + "check_array(A, shape=(3,3), dtype=np.integer)\n", + "check_value(A.sum(), 36)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Q2) From A, create a boolean mask selecting even numbers\n", + "# TODO: assign to variable 'mask_even'\n", + "mask_even = ... # TODO\n\n", + "# πŸ”’ Test\n", + "check_array(mask_even, shape=(3,3), dtype=bool)\n", + "check_value(int(mask_even.sum()), 5) # number of evens in 0..8\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Q3) Reshape, stack, and compute row-wise means β†’ 'means'\n", + "# TODO: assign to variable 'means' (1D array length 3)\n", + "B = ... # TODO\n", + "means = ... # TODO\n\n", + "# πŸ”’ Test\n", + "check_array(means, shape=(3,))\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Q4) Broadcasting: A (3x3) and v (1x3) β†’ 'C'\n", + "v = np.array([10, 0, -10])\n", + "C = ... # TODO\n\n", + "# πŸ”’ Test\n", + "check_array(C, shape=(3,3), dtype=np.integer)\n", + "check_value(int(C[0,0] + C[2,2]), (A[0,0]+10) + (A[2,2]-10))\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Q5) Fancy indexing / boolean masking\n", + "# Extract odd numbers β‰₯ 3 from A β†’ 'odd_ge3'\n", + "odd_ge3 = ... # TODO\n\n", + "# πŸ”’ Test\n", + "check_array(\n", + " odd_ge3,\n", + " shape=(np.count_nonzero((A>=3)&(A%2==1)),),\n", + " dtype=np.integer,\n", + " allow_int_any=True\n", + ")\n", + "check_value(int(odd_ge3.min()), 3)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### βœ… Submit\n", + "- All tests above passed\n", + "- Save notebook and commit to your repo\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.11.8", + "mimetype": "text/x-python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "file_extension": ".py" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} \ No newline at end of file From 8d83324a8f45a2771d87886a43812ccc452c2ea3 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Thu, 21 Aug 2025 17:39:13 -0400 Subject: [PATCH 16/26] Delete checkpoints/01_numpy_basics.md --- checkpoints/01_numpy_basics.md | 71 ---------------------------------- 1 file changed, 71 deletions(-) delete mode 100644 checkpoints/01_numpy_basics.md diff --git a/checkpoints/01_numpy_basics.md b/checkpoints/01_numpy_basics.md deleted file mode 100644 index 594e7bc..0000000 --- a/checkpoints/01_numpy_basics.md +++ /dev/null @@ -1,71 +0,0 @@ -# βœ… Checkpoint 01 β€” NumPy Basics - -**Goal** -- Create/reshape arrays, vectorized ops, boolean masking - -**Rules** -- Fill only where marked as `# TODO` -- Do not change test cells (πŸ”’) -- Run all cells before submitting - -**References** -- NumPy docs: https://numpy.org/doc/ - ---- - -```python -# πŸ”§ Setup -import numpy as np -import pandas as pd -from utils.grader import check_array, check_value - -np.random.seed(42) -``` - - -# Q1) Create a 3x3 array with values 0..8 (row-major) -# TODO: assign to variable 'A' -A = ... # TODO - -# πŸ”’ Test -check_array(A, shape=(3,3), dtype=np.integer) -check_value(A.sum(), 36) - -# Q2) From A, create a boolean mask selecting even numbers -# TODO: assign to variable 'mask_even' -mask_even = ... # TODO - -# πŸ”’ Test -check_array(mask_even, shape=(3,3), dtype=bool) -check_value(int(mask_even.sum()), 5) # number of evens in 0..8 - -# Q3) Reshape, stack, and compute row-wise means β†’ 'means' -# TODO: assign to variable 'means' (1D array length 3) -B = ... # TODO -means = ... # TODO - -# πŸ”’ Test -check_array(means, shape=(3,)) - - -# Q4) Broadcasting: A (3x3) and v (1x3) β†’ 'C' -v = np.array([10, 0, -10]) -C = ... # TODO - -# πŸ”’ Test -check_array(C, shape=(3,3), dtype=np.integer) -check_value(int(C[0,0] + C[2,2]), (A[0,0]+10) + (A[2,2]-10)) - - -# Q5) Fancy indexing / boolean masking -# Extract odd numbers β‰₯ 3 from A β†’ 'odd_ge3' -odd_ge3 = ... # TODO - -# πŸ”’ Test -check_array( - odd_ge3, - shape=(np.count_nonzero((A>=3)&(A%2==1)),), - dtype=np.integer, - allow_int_any=True -) -check_value(int(odd_ge3.min()), 3) From fb886093082f6e868c2b23df076b48f673846c59 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Thu, 21 Aug 2025 19:51:05 -0400 Subject: [PATCH 17/26] Create grader --- checkpoints/utils/grader | 1 + 1 file changed, 1 insertion(+) create mode 100644 checkpoints/utils/grader diff --git a/checkpoints/utils/grader b/checkpoints/utils/grader new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/checkpoints/utils/grader @@ -0,0 +1 @@ + From ef40b2a489aaaeaaaf423397a5bd443d2c5f811d Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Thu, 21 Aug 2025 19:51:32 -0400 Subject: [PATCH 18/26] Add files via upload --- checkpoints/utils/grader.py | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 checkpoints/utils/grader.py diff --git a/checkpoints/utils/grader.py b/checkpoints/utils/grader.py new file mode 100644 index 0000000..e496ae3 --- /dev/null +++ b/checkpoints/utils/grader.py @@ -0,0 +1,18 @@ +import numpy as np + +def check_array(arr, shape=None, dtype=None, allow_int_any=False): + if not isinstance(arr, np.ndarray): + raise AssertionError(f"❌ Expected numpy.ndarray, got {type(arr)}") + if shape is not None and arr.shape != shape: + raise AssertionError(f"❌ Wrong shape: expected {shape}, got {arr.shape}") + if dtype is not None: + if allow_int_any and np.issubdtype(arr.dtype, np.integer): + pass + elif not np.issubdtype(arr.dtype, dtype): + raise AssertionError(f"❌ Wrong dtype: expected {dtype}, got {arr.dtype}") + print("βœ… Array check passed.") + +def check_value(val, expected): + if val != expected: + raise AssertionError(f"❌ Wrong value: expected {expected}, got {val}") + print("βœ… Value check passed.") From e876d0fd9769a88dc010924a9a20b01586bcb5fc Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Thu, 21 Aug 2025 19:51:50 -0400 Subject: [PATCH 19/26] Delete checkpoints/utils/grader --- checkpoints/utils/grader | 1 - 1 file changed, 1 deletion(-) delete mode 100644 checkpoints/utils/grader diff --git a/checkpoints/utils/grader b/checkpoints/utils/grader deleted file mode 100644 index 8b13789..0000000 --- a/checkpoints/utils/grader +++ /dev/null @@ -1 +0,0 @@ - From a56a159a9f300b3277de20b8e4a113ace9a09186 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Sat, 23 Aug 2025 19:24:31 -0400 Subject: [PATCH 20/26] Create 02_pandas_basics.ipynb --- checkpoints/02_pandas_basics.ipynb | 167 +++++++++++++++++++++++++++++ 1 file changed, 167 insertions(+) create mode 100644 checkpoints/02_pandas_basics.ipynb diff --git a/checkpoints/02_pandas_basics.ipynb b/checkpoints/02_pandas_basics.ipynb new file mode 100644 index 0000000..839870c --- /dev/null +++ b/checkpoints/02_pandas_basics.ipynb @@ -0,0 +1,167 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# βœ… Checkpoint 02 β€” pandas Basics\n", + "\n", + "**Goal**\n", + "- Load/create DataFrames, filter & sort, add computed columns, groupby/aggregate, and merge.\n", + "\n", + "**Rules**\n", + "- Fill only where marked as `# TODO`\n", + "- Do not change test cells (πŸ”’)\n", + "- Run all cells before submitting\n", + "\n", + "**References**\n", + "- pandas docs: https://pandas.pydata.org/docs/\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# πŸ”§ Setup\n", + "import numpy as np\n", + "import pandas as pd\n", + "from utils.grader import (\n", + " check_array, check_value, check_dataframe_columns,\n", + " check_series_index_values, check_len\n", + ")\n", + "np.random.seed(42)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Small in-memory data we'll use throughout\n", + "data = {\n", + " 'city': ['Ann Arbor','Kalamazoo','Detroit','Grand Rapids','Lansing'],\n", + " 'temp_f': [68, 77, 59, 90, 82],\n", + " 'rain': [False, True, False, False, True],\n", + " 'date': pd.to_datetime(['2025-08-20','2025-08-20','2025-08-20','2025-08-20','2025-08-20'])\n", + "}\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Q1) Create a DataFrame 'df' from the dict 'data' with columns in order: city, temp_f, rain, date\n", + "# TODO: assign to variable 'df'\n", + "df = ... # TODO\n", + "\n", + "# πŸ”’ Test\n", + "check_dataframe_columns(df, ['city','temp_f','rain','date'])\n", + "check_value(df.iloc[0]['city'], 'Ann Arbor')\n", + "check_len(df, 5)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Q2) Filter rows where rain == False, sort by temp_f descending, reset index β†’ 'df_dry'\n", + "# TODO: assign to variable 'df_dry'\n", + "df_dry = ... # TODO\n", + "\n", + "# πŸ”’ Test\n", + "check_len(df_dry, 3)\n", + "check_value(df_dry.iloc[0]['temp_f'], 90)\n", + "check_dataframe_columns(df_dry, ['city','temp_f','rain','date'])\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Q3) Add a Celsius column: temp_c = round((temp_f - 32) * 5/9, 1)\n", + "# TODO: create 'temp_c' column on df\n", + "...\n", + "\n", + "# πŸ”’ Test\n", + "check_value(float(df.loc[df['city']=='Grand Rapids','temp_c'].iloc[0]), round((90-32)*5/9,1))\n", + "check_dataframe_columns(df, ['city','temp_f','rain','date','temp_c'])\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Q4) Group by 'rain' and compute mean temp_c β†’ 'avg_temp_by_rain' (Series indexed by rain boolean)\n", + "# TODO: assign to variable 'avg_temp_by_rain'\n", + "avg_temp_by_rain = ... # TODO\n", + "\n", + "# πŸ”’ Test (values checked approximately)\n", + "check_series_index_values(avg_temp_by_rain, {False, True})\n", + "mean_false = avg_temp_by_rain.loc[False]\n", + "mean_true = avg_temp_by_rain.loc[True]\n", + "check_value(round(float(mean_false),1), round(((68-32)*5/9 + (59-32)*5/9 + (90-32)*5/9)/3, 1))\n", + "check_value(round(float(mean_true),1), round(((77-32)*5/9 + (82-32)*5/9)/2, 1))\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Q5) Merge: create a DataFrame 'city_region' with columns city and region, then left-merge onto df β†’ 'df_merged'\n", + "city_region = pd.DataFrame({\n", + " 'city': ['Ann Arbor','Kalamazoo','Detroit','Grand Rapids','Lansing'],\n", + " 'region': ['SE','SW','SE','W','C']\n", + "})\n", + "# TODO: left-merge on 'city' to produce df_merged\n", + "df_merged = ... # TODO\n", + "\n", + "# πŸ”’ Test\n", + "check_dataframe_columns(df_merged, ['city','temp_f','rain','date','temp_c','region'])\n", + "check_value(set(df_merged['region']), {'SE','SW','W','C'})\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### βœ… Submit\n", + "- All tests above passed\n", + "- Save notebook and commit to your repo\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.11", + "mimetype": "text/x-python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "file_extension": ".py" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From d4a12100ff90748fb3fdfff4bb6bb996fa83d6fd Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Sun, 24 Aug 2025 20:29:26 -0400 Subject: [PATCH 21/26] Update grader.py --- checkpoints/utils/grader.py | 46 ++++++++++++++++++++++++++++++++----- 1 file changed, 40 insertions(+), 6 deletions(-) diff --git a/checkpoints/utils/grader.py b/checkpoints/utils/grader.py index e496ae3..6237e45 100644 --- a/checkpoints/utils/grader.py +++ b/checkpoints/utils/grader.py @@ -1,18 +1,52 @@ import numpy as np +import pandas as pd +def _fail(msg): + raise AssertionError(msg) + +# Numpy def check_array(arr, shape=None, dtype=None, allow_int_any=False): if not isinstance(arr, np.ndarray): - raise AssertionError(f"❌ Expected numpy.ndarray, got {type(arr)}") + _fail(f"❌ Expected numpy.ndarray, got {type(arr)}") if shape is not None and arr.shape != shape: - raise AssertionError(f"❌ Wrong shape: expected {shape}, got {arr.shape}") + _fail(f"❌ Wrong shape: expected {shape}, got {arr.shape}") if dtype is not None: if allow_int_any and np.issubdtype(arr.dtype, np.integer): pass elif not np.issubdtype(arr.dtype, dtype): - raise AssertionError(f"❌ Wrong dtype: expected {dtype}, got {arr.dtype}") + _fail(f"❌ Wrong dtype: expected {dtype}, got {arr.dtype}") print("βœ… Array check passed.") -def check_value(val, expected): - if val != expected: - raise AssertionError(f"❌ Wrong value: expected {expected}, got {val}") +def check_value(val, expected, tol=1e-8): + if isinstance(val, (float, np.floating)) or isinstance(expected, (float, np.floating)): + if abs(float(val) - float(expected)) > tol: + _fail(f"❌ Wrong value: expected {expected}, got {val}") + else: + if val != expected: + _fail(f"❌ Wrong value: expected {expected}, got {val}") print("βœ… Value check passed.") + +# pandas +def check_dataframe_columns(df, expected_cols): + if not isinstance(df, pd.DataFrame): + _fail(f"❌ Expected pandas.DataFrame, got {type(df)}") + missing = [c for c in expected_cols if c not in df.columns] + if missing: + _fail(f"❌ Missing columns: {missing}") + print("βœ… DataFrame columns check passed.") + +def check_series_index_values(s, expected_index_set): + if not isinstance(s, pd.Series): + _fail(f"❌ Expected pandas.Series, got {type(s)}") + if set(list(s.index)) != set(list(expected_index_set)): + _fail(f"❌ Unexpected index: got {list(s.index)}, expected set {list(expected_index_set)}") + print("βœ… Series index check passed.") + +def check_len(obj, expected_len): + try: + n = len(obj) + except Exception as e: + _fail(f"❌ Object has no len(): {e}") + if n != expected_len: + _fail(f"❌ Wrong length: expected {expected_len}, got {n}") + print("βœ… Length check passed.") From ba151f93e62b6d25075f6206d9936dd52644f7c3 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Sun, 24 Aug 2025 20:36:02 -0400 Subject: [PATCH 22/26] Create 03_matplotlib_seaborn.ipynb --- checkpoints/03_matplotlib_seaborn.ipynb | 237 ++++++++++++++++++++++++ 1 file changed, 237 insertions(+) create mode 100644 checkpoints/03_matplotlib_seaborn.ipynb diff --git a/checkpoints/03_matplotlib_seaborn.ipynb b/checkpoints/03_matplotlib_seaborn.ipynb new file mode 100644 index 0000000..d87d586 --- /dev/null +++ b/checkpoints/03_matplotlib_seaborn.ipynb @@ -0,0 +1,237 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# βœ… Checkpoint 03 β€” Matplotlib & Seaborn\n", + "\n", + "**Goal**\n", + "- Create basic plots with Matplotlib & Seaborn: scatter, histogram, boxplot, and aggregated barplot.\n", + "- Set titles/labels properly and export figures as files.\n", + "\n", + "**Rules**\n", + "- Fill only where marked as `# TODO`\n", + "- Do not change test cells (πŸ”’)\n", + "- Run all cells in order before submitting\n", + "\n", + "**References**\n", + "- Matplotlib docs: https://matplotlib.org/stable/\n", + "- Seaborn docs: https://seaborn.pydata.org/\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# πŸ”§ Setup\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import matplotlib.pyplot as plt\n", + "import seaborn as sns\n", + "from utils.grader import (\n", + " check_value, check_len, check_file_exists,\n", + " check_axes_instance, check_xlabel, check_ylabel, check_title_contains,\n", + " check_num_lines, check_num_collections, check_num_patches\n", + ")\n", + "np.random.seed(42)\n", + "\n", + "# ensure output dir\nn", + "os.makedirs('outputs', exist_ok=True)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# small synthetic dataset (deterministic)\n", + "n = 120\n", + "days = np.random.choice(['Thur','Fri','Sat','Sun'], size=n, p=[0.25,0.2,0.3,0.25])\n", + "sex = np.random.choice(['Male','Female'], size=n)\n", + "smoker = np.random.choice(['Yes','No'], size=n, p=[0.3,0.7])\n", + "total_bill = np.round(np.random.normal(loc=24, scale=8, size=n).clip(5, 80), 2)\n", + "tip = np.round((total_bill * np.random.uniform(0.08, 0.22, size=n)), 2)\n", + "\n", + "df = pd.DataFrame({\n", + " 'day': days,\n", + " 'sex': sex,\n", + " 'smoker': smoker,\n", + " 'total_bill': total_bill,\n", + " 'tip': tip\n", + "})\n", + "df.head()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q1) Matplotlib Scatter\n", + "Create a scatter plot of `total_bill` (x) vs `tip` (y) using **Matplotlib**.\n", + "- Put the **x label**: `Total Bill ($)`\n", + "- Put the **y label**: `Tip ($)`\n", + "- Title should contain the word **\"Scatter\"**\n", + "- Save the fig object in a variable named **`fig1`**, axes in **`ax1`**\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: create fig1, ax1, draw scatter, set labels and title\n", + "fig1, ax1 = ... # TODO\n", + "\n", + "# πŸ”’ Test\n", + "check_axes_instance(ax1)\n", + "check_xlabel(ax1, 'Total Bill ($)')\n", + "check_ylabel(ax1, 'Tip ($)')\n", + "check_title_contains(ax1, 'Scatter')\n", + "check_num_collections(ax1, 1) # one scatter collection\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q2) Seaborn Boxplot\n", + "Using **Seaborn**, create a **boxplot** of `tip` by `day` (x=`day`, y=`tip`).\n", + "- Store the Axes in a variable named **`ax2`**\n", + "- x label must be `Day`, y label must be `Tip ($)`\n", + "- Title should contain the word **\"Box\"**\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: create ax2 using seaborn.boxplot\n", + "ax2 = ... # TODO\n", + "...\n", + "\n", + "# πŸ”’ Test\n", + "check_axes_instance(ax2)\n", + "check_xlabel(ax2, 'Day')\n", + "check_ylabel(ax2, 'Tip ($)')\n", + "check_title_contains(ax2, 'Box')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q3) Matplotlib Histogram\n", + "Create a **histogram** of `total_bill` with **10 bins** using Matplotlib.\n", + "- Save fig as **`fig3`**, axes as **`ax3`**\n", + "- Title should contain **\"Histogram\"**\n", + "- x label `Total Bill ($)`\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: histogram with 10 bins\n", + "fig3, ax3 = ... # TODO\n", + "...\n", + "\n", + "# πŸ”’ Test\n", + "check_axes_instance(ax3)\n", + "check_title_contains(ax3, 'Histogram')\n", + "check_xlabel(ax3, 'Total Bill ($)')\n", + "check_num_patches(ax3, 10)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q4) Seaborn Aggregated Barplot\n", + "Add a computed column `tip_pct = tip / total_bill * 100`. Then plot the **mean tip % by day** using Seaborn (barplot).\n", + "- Store the Axes in **`ax4`**\n", + "- There should be one bar per unique day in `df['day']`\n", + "- y label should contain the `%` sign\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: add tip_pct column and make barplot of mean tip_pct by day\n", + "...\n", + "ax4 = ... # TODO\n", + "...\n", + "\n", + "# πŸ”’ Test\n", + "check_axes_instance(ax4)\n", + "unique_days = sorted(df['day'].unique().tolist())\n", + "check_len(ax4.patches, len(unique_days))\n", + "check_ylabel(ax4, '%') # contains percent sign\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q5) Save Figure to File\n", + "Save the Q1 scatter figure to `outputs/fig_scatter.png` using `fig1.savefig(...)`.\n", + "- The path must be exactly `outputs/fig_scatter.png`\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: save fig1 to outputs/fig_scatter.png\n", + "...\n", + "\n", + "# πŸ”’ Test\n", + "check_file_exists('outputs/fig_scatter.png')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### βœ… Submit\n", + "- All tests above passed\n", + "- Save notebook and commit to your repo\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.11", + "mimetype": "text/x-python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "file_extension": ".py" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From c2e100ecfd37b137a8206a21b72a6d8efbaac3b9 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Sun, 24 Aug 2025 21:54:40 -0400 Subject: [PATCH 23/26] Update grader.py --- checkpoints/utils/grader.py | 53 +++++++++++++++++++++++++++++++++++-- 1 file changed, 51 insertions(+), 2 deletions(-) diff --git a/checkpoints/utils/grader.py b/checkpoints/utils/grader.py index 6237e45..c431013 100644 --- a/checkpoints/utils/grader.py +++ b/checkpoints/utils/grader.py @@ -1,10 +1,13 @@ +import os import numpy as np import pandas as pd +import matplotlib +import matplotlib.pyplot as plt def _fail(msg): raise AssertionError(msg) -# Numpy +# generic/NumPy/pandas def check_array(arr, shape=None, dtype=None, allow_int_any=False): if not isinstance(arr, np.ndarray): _fail(f"❌ Expected numpy.ndarray, got {type(arr)}") @@ -26,7 +29,6 @@ def check_value(val, expected, tol=1e-8): _fail(f"❌ Wrong value: expected {expected}, got {val}") print("βœ… Value check passed.") -# pandas def check_dataframe_columns(df, expected_cols): if not isinstance(df, pd.DataFrame): _fail(f"❌ Expected pandas.DataFrame, got {type(df)}") @@ -50,3 +52,50 @@ def check_len(obj, expected_len): if n != expected_len: _fail(f"❌ Wrong length: expected {expected_len}, got {n}") print("βœ… Length check passed.") + +def check_file_exists(path): + if not os.path.exists(path): + _fail(f"❌ File not found: {path}") + print("βœ… File exists.") + +# Matplotlib/Seaborn helpers for checkpoint 03 +def check_axes_instance(ax): + if not hasattr(ax, "get_xlabel") or not hasattr(ax, "get_ylabel"): + _fail(f"❌ Expected a Matplotlib Axes-like object, got {type(ax)}") + print("βœ… Axes instance check passed.") + +def check_xlabel(ax, expected): + label = ax.get_xlabel() + if label != expected and expected not in label: + _fail(f"❌ X label mismatch. Got '{label}', expected '{expected}' (or containing it).") + print("βœ… X label ok.") + +def check_ylabel(ax, expected): + label = ax.get_ylabel() + if label != expected and expected not in label: + _fail(f"❌ Y label mismatch. Got '{label}', expected '{expected}' (or containing it).") + print("βœ… Y label ok.") + +def check_title_contains(ax, keyword): + title = ax.get_title() + if keyword not in title: + _fail(f"❌ Title does not contain '{keyword}'. Got '{title}'") + print("βœ… Title contains keyword.") + +def check_num_lines(ax, expected_n): + n = len(ax.lines) + if n != expected_n: + _fail(f"❌ Expected {expected_n} line(s), got {n}") + print("βœ… Number of lines ok.") + +def check_num_collections(ax, expected_n): + n = len(ax.collections) + if n != expected_n: + _fail(f"❌ Expected {expected_n} collection(s), got {n}") + print("βœ… Number of collections ok.") + +def check_num_patches(ax, expected_n): + n = len(ax.patches) + if n != expected_n: + _fail(f"❌ Expected {expected_n} patch(es), got {n}") + print("βœ… Number of patches ok.") From 45523f81b02952f9efab8b9dec4df2674a893770 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Sun, 24 Aug 2025 23:09:48 -0400 Subject: [PATCH 24/26] Create 04_plotly_intro.ipynb --- checkpoints/04_plotly_intro.ipynb | 237 ++++++++++++++++++++++++++++++ 1 file changed, 237 insertions(+) create mode 100644 checkpoints/04_plotly_intro.ipynb diff --git a/checkpoints/04_plotly_intro.ipynb b/checkpoints/04_plotly_intro.ipynb new file mode 100644 index 0000000..6912439 --- /dev/null +++ b/checkpoints/04_plotly_intro.ipynb @@ -0,0 +1,237 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# βœ… Checkpoint 04 β€” Plotly Intro\n", + "\n", + "**Goal**\n", + "- Build interactive charts with Plotly (scatter, histogram, bar) using both Express and Graph Objects.\n", + "- Set titles/axis labels, count traces, and export figures to HTML.\n", + "\n", + "**Rules**\n", + "- Fill only where marked as `# TODO`.\n", + "- Do not change test cells (πŸ”’).\n", + "- Run all cells in order before submitting.\n", + "\n", + "**References**\n", + "- Plotly docs: https://plotly.com/python/\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# πŸ”§ Setup\n", + "import os\n", + "import numpy as np\n", + "import pandas as pd\n", + "import plotly.express as px\n", + "import plotly.graph_objects as go\n", + "from utils.grader import (\n", + " check_file_exists,\n", + " check_figure, check_trace_count,\n", + " check_axis_title, check_layout_title_contains,\n", + " check_bar_count, check_trace_modes\n", + ")\n", + "np.random.seed(42)\n", + "os.makedirs('outputs', exist_ok=True)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Small deterministic dataset (similar to 'tips')\n", + "n = 120\n", + "days = np.random.choice(['Thur','Fri','Sat','Sun'], size=n, p=[0.25,0.2,0.3,0.25])\n", + "sex = np.random.choice(['Male','Female'], size=n)\n", + "smoker = np.random.choice(['Yes','No'], size=n, p=[0.3,0.7])\n", + "total_bill = np.round(np.random.normal(loc=24, scale=8, size=n).clip(5, 80), 2)\n", + "tip = np.round((total_bill * np.random.uniform(0.08, 0.22, size=n)), 2)\n", + "df = pd.DataFrame({\n", + " 'day': days,\n", + " 'sex': sex,\n", + " 'smoker': smoker,\n", + " 'total_bill': total_bill,\n", + " 'tip': tip\n", + "})\n", + "df.head()\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q1) Plotly Express β€” Scatter\n", + "Create a scatter plot of `total_bill` (x) vs `tip` (y) using **Plotly Express**.\n", + "- Color by `day` (optional but encouraged).\n", + "- Title should contain **\"Scatter\"**.\n", + "- x-axis title: `Total Bill ($)`; y-axis title: `Tip ($)`.\n", + "- Store the figure in **`fig1`**.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: create fig1 with px.scatter\n", + "fig1 = ... # TODO\n", + "# Example (for reference):\n", + "# fig1 = px.scatter(df, x='total_bill', y='tip', color='day', title='Scatter: Tip vs Total Bill')\n", + "# fig1.update_layout(xaxis_title='Total Bill ($)', yaxis_title='Tip ($)')\n", + "\n", + "# πŸ”’ Test\n", + "check_figure(fig1)\n", + "check_trace_count(fig1, expected_min=1) # at least 1 trace (color may create >1)\n", + "check_layout_title_contains(fig1, 'Scatter')\n", + "check_axis_title(fig1, axis='x', expected='Total Bill ($)')\n", + "check_axis_title(fig1, axis='y', expected='Tip ($)')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q2) Plotly Express β€” Histogram\n", + "Create a histogram of `total_bill` with **10 bins**.\n", + "- Title should contain **\"Histogram\"**.\n", + "- x-axis title: `Total Bill ($)`.\n", + "- Store the figure in **`fig2`**.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: create fig2 with px.histogram and nbins=10\n", + "fig2 = ... # TODO\n", + "# Example:\n", + "# fig2 = px.histogram(df, x='total_bill', nbins=10, title='Histogram: Total Bill')\n", + "# fig2.update_layout(xaxis_title='Total Bill ($)')\n", + "\n", + "# πŸ”’ Test\n", + "check_figure(fig2)\n", + "check_trace_count(fig2, expected_min=1)\n", + "check_layout_title_contains(fig2, 'Histogram')\n", + "check_axis_title(fig2, axis='x', expected='Total Bill ($)')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q3) Plotly Express β€” Bar (mean tip%)\n", + "Add a computed column `tip_pct = tip / total_bill * 100`. Then plot the **mean tip % by day** as a bar chart.\n", + "- One bar per unique `day`.\n", + "- y-axis title should contain `%`.\n", + "- Store the figure in **`fig3`**.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: compute tip_pct and create fig3\n", + "...\n", + "fig3 = ... # TODO\n", + "\n", + "# πŸ”’ Test\n", + "check_figure(fig3)\n", + "unique_days = sorted(df['day'].unique().tolist())\n", + "check_bar_count(fig3, expected=len(unique_days))\n", + "check_axis_title(fig3, axis='y', expected='%') # contains percent sign\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q4) Graph Objects β€” Line (running mean of tip)\n", + "Using **plotly.graph_objects**, build a line chart of the running mean of `tip` over row index.\n", + "- Use `go.Figure` with a single `go.Scatter` trace in `'lines'` mode.\n", + "- Title should contain **\"Running Mean\"**.\n", + "- Store the figure in **`fig4`**.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: create running mean and fig4 with go.Figure\n", + "...\n", + "fig4 = ... # TODO\n", + "\n", + "# πŸ”’ Test\n", + "check_figure(fig4)\n", + "check_trace_count(fig4, expected_min=1, expected_max=1)\n", + "check_trace_modes(fig4, must_include='lines')\n", + "check_layout_title_contains(fig4, 'Running Mean')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Q5) Export to HTML\n", + "Save the Q1 scatter figure to **`outputs/fig_scatter.html`** using `fig1.write_html(...)`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# TODO: export fig1 to outputs/fig_scatter.html\n", + "...\n", + "\n", + "# πŸ”’ Test\n", + "check_file_exists('outputs/fig_scatter.html')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### βœ… Submit\n", + "- All tests above passed\n", + "- Save notebook and commit to your repo\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "name": "python", + "version": "3.11", + "mimetype": "text/x-python", + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "pygments_lexer": "ipython3", + "nbconvert_exporter": "python", + "file_extension": ".py" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From 0028375e3ca254efb1559f9ff741b5174b92ec87 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Mon, 25 Aug 2025 02:21:55 -0400 Subject: [PATCH 25/26] Update grader.py --- checkpoints/utils/grader.py | 71 +++++++++++++++++++++++++++++++++++-- 1 file changed, 69 insertions(+), 2 deletions(-) diff --git a/checkpoints/utils/grader.py b/checkpoints/utils/grader.py index c431013..4ac8796 100644 --- a/checkpoints/utils/grader.py +++ b/checkpoints/utils/grader.py @@ -1,3 +1,4 @@ +# utils/grader.py import os import numpy as np import pandas as pd @@ -7,7 +8,7 @@ def _fail(msg): raise AssertionError(msg) -# generic/NumPy/pandas +# Generic / NumPy / pandas def check_array(arr, shape=None, dtype=None, allow_int_any=False): if not isinstance(arr, np.ndarray): _fail(f"❌ Expected numpy.ndarray, got {type(arr)}") @@ -58,7 +59,7 @@ def check_file_exists(path): _fail(f"❌ File not found: {path}") print("βœ… File exists.") -# Matplotlib/Seaborn helpers for checkpoint 03 +# Matplotlib / Seaborn helpers def check_axes_instance(ax): if not hasattr(ax, "get_xlabel") or not hasattr(ax, "get_ylabel"): _fail(f"❌ Expected a Matplotlib Axes-like object, got {type(ax)}") @@ -99,3 +100,69 @@ def check_num_patches(ax, expected_n): if n != expected_n: _fail(f"❌ Expected {expected_n} patch(es), got {n}") print("βœ… Number of patches ok.") + +# Plotly helpers +def check_figure(fig): + try: + import plotly.graph_objects as go + except Exception as e: + _fail(f"❌ Plotly not installed: {e}") + if not isinstance(fig, go.Figure): + _fail(f"❌ Expected plotly.graph_objects.Figure, got {type(fig)}") + print("βœ… Figure instance ok.") + +def check_trace_count(fig, expected_min=None, expected_max=None): + n = len(fig.data) + if expected_min is not None and n < expected_min: + _fail(f"❌ Too few traces: got {n}, expected >= {expected_min}") + if expected_max is not None and n > expected_max: + _fail(f"❌ Too many traces: got {n}, expected <= {expected_max}") + print("βœ… Trace count ok.") + +def _get_axis(fig, axis): + if axis == 'x': + return fig.layout.xaxis + elif axis == 'y': + return fig.layout.yaxis + else: + _fail("❌ axis must be 'x' or 'y'") + +def check_axis_title(fig, axis='x', expected=None): + ax = _get_axis(fig, axis) + title = getattr(ax.title, "text", "") if ax.title else "" + if expected is None: + _fail("❌ expected title text is None") + if expected != title and (expected not in title): + _fail(f"❌ {axis}-axis title mismatch. Got '{title}', expected '{expected}' (or containing it).") + print(f"βœ… {axis.upper()} axis title ok.") + +def check_layout_title_contains(fig, keyword): + title = getattr(fig.layout.title, "text", "") if fig.layout.title else "" + if keyword not in title: + _fail(f"❌ Layout title does not contain '{keyword}'. Got '{title}'") + print("βœ… Layout title contains keyword.") + +def check_bar_count(fig, expected): + if len(fig.data) == 0: + _fail("❌ No traces in figure.") + trace = fig.data[0] + xs = getattr(trace, "x", None) + if xs is None: + _fail("❌ Bar trace has no x values.") + n = len(xs) + if n != expected: + _fail(f"❌ Expected {expected} bars, got {n}") + print("βœ… Bar count ok.") + +def check_trace_modes(fig, must_include='lines'): + if len(fig.data) == 0: + _fail("❌ No traces in figure.") + modes = [] + for t in fig.data: + mode = getattr(t, "mode", None) + if mode: + modes.append(mode) + joined = ",".join(modes) + if must_include not in joined: + _fail(f"❌ Required mode '{must_include}' not found in traces. Got modes: {modes}") + print("βœ… Trace mode ok.") From 8cdde1248efae90de785a6e761db5ba6375267f4 Mon Sep 17 00:00:00 2001 From: Aiden <113921954+cereal-with-water@users.noreply.github.com> Date: Thu, 4 Sep 2025 16:46:47 -0400 Subject: [PATCH 26/26] Update README.md --- README.md | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 73 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index cff496d..8aaeca6 100644 --- a/README.md +++ b/README.md @@ -213,9 +213,79 @@ stacked = np.vstack([c, d]) # vertical stack of two 2Γ—3 arrays
- πŸ”₯Basic Machine Learning with scikit-learnπŸ”₯ - Build your first regression and classification models, split data, and evaluate performance. -
+ πŸ”₯Basic Machine Learning with scikit-learnπŸ”₯ +Build your first regression and classification models, split data, and evaluate performance. + +## πŸ” Library Overview +scikit-learn is one of the most widely used ML libraries in Python. +It provides simple APIs for preprocessing, training models, and evaluating performance. + +### ✨ Key Features +- Large collection of supervised & unsupervised algorithms +- Easy dataset splitting, scaling, and pipelines +- Built-in metrics for evaluation +- Works seamlessly with NumPy & pandas + +--- + +### 1. What +> **What you will learn in this section.** +> By the end of this notebook, you will be able to: +> - Split data into train/test sets +> - Train a simple regression model +> - Train a classification model +> - Evaluate predictions using accuracy and error metrics + +--- + +### 2. Why +> **Why this topic matters.** +> - Machine Learning is the core of many data science projects. +> - scikit-learn offers a consistent interface to try many models quickly. +> - Understanding the ML workflow (split β†’ train β†’ predict β†’ evaluate) is essential. + +--- +### 3. How +> **How to do it.** +> Follow these hands-on examples: + +```python +from sklearn.datasets import load_iris, make_regression +from sklearn.model_selection import train_test_split +from sklearn.linear_model import LinearRegression, LogisticRegression +from sklearn.metrics import mean_squared_error, accuracy_score +import numpy as np + +# --- Regression Example --- +# Generate synthetic data +X_reg, y_reg = make_regression(n_samples=100, n_features=1, noise=10, random_state=42) + +# Train/test split +X_train, X_test, y_train, y_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42) + +# Fit linear regression +reg = LinearRegression() +reg.fit(X_train, y_train) + +# Predict and evaluate +y_pred = reg.predict(X_test) +print("MSE (Regression):", mean_squared_error(y_test, y_pred)) + + +# --- Classification Example --- +iris = load_iris() +X_clf, y_clf = iris.data, iris.target + +X_train, X_test, y_train, y_test = train_test_split(X_clf, y_clf, test_size=0.2, random_state=42) + +clf = LogisticRegression(max_iter=200) +clf.fit(X_train, y_train) + +y_pred = clf.predict(X_test) +print("Accuracy (Classification):", accuracy_score(y_test, y_pred)) + +``` +