diff --git a/README.md b/README.md
index c3138ed..8aaeca6 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,291 @@
 # Python-Data-Science-Onboarding
-Coming Soon
+
+Welcome to the WMU DSC/Developer Club!<br>
+This repository is designed to help new members get familiar with the tools and workflows commonly used in our data science projects.
+<br>
+<br>
+
+
+
+## 🚀 Who is this for?
+
+This tutorial assumes you already have *basic Python knowledge*, including:
+
+- Using numpy and pandas for data handling  
+- Knowing what a .ipynb Jupyter Notebook file is  
+- Using scikit-learn to build simple machine learning models
+
+<details>
+<summary><b>❓Don't know Python yet? No problem!❓</b></summary>
+<br>
+
+>  **Start with the resources below before continuing:**<br>
+> &emsp;&emsp;[W3Schools Python Tutorial](https://www.w3schools.com/python/)  
+> &emsp;&emsp;[Google's Python Class](https://developers.google.com/edu/python)  
+> &emsp;&emsp;[Python for Beginners (YouTube)](https://www.youtube.com/watch?v=K5KVEU3aaeQ&t=56s)  
+</details>
+
+
+<details>
+<summary> <b>❓Python Installation Guide For Beginners❓</b></summary>
+<br>
+  
+> ### To follow along with the notebooks in this repository, you need Python installed on your machine.
+> ### 🎥 How to Install Python
+> &emsp;&emsp;  [For macOS](https://www.youtube.com/watch?v=nhv82tvFfkM)  
+> &emsp;&emsp;  [For Windows](https://www.youtube.com/watch?v=YagM_FuPLQU)<br><br>
+> 📌 *Important*: During installation, make sure to check:  
+>  *“Add Python to PATH”*
+
+###  Verify Your Installation
+
+After installing, open a terminal (or Command Prompt on Windows), and run:
+
+```bash
+python --version
+pip --version
+```
+</details>
+<br>
+<br>
+
+
+
+## 📦 Recommended Libraries
+
+In Python, you install packages by running:
+```bash
+pip install <package-name>
+```
+
+Before you dive into the notebooks, make sure you have the core data-science libraries installed. You can install them all at once via pip:
+
+```bash
+pip install \
+  numpy \
+  pandas \
+  matplotlib \
+  seaborn \
+  scikit-learn \
+  notebook
+```
+<br>
+<br>
+
+
+
+## 📘 Core Topics
+
+<details>
+<summary> <b>🔥Understanding Jupyter Notebooks (.ipynb)</b>🔥</summary>
+What are text vs code cells, how to run them, and best practices for documenting your analysis.
+# 📝 Jupyter Notebook Quickstart Guide
+
+This guide will introduce you to Jupyter Notebook—from “what it is” to how to install and use it locally or in the cloud—then walk you through basic operations, hands-on examples, Markdown usage, and sharing.
+
+---
+
+## 🔍 What Is Jupyter Notebook?
+
+Jupyter Notebook is an interactive computing environment where you can combine live code, equations, visualizations, and narrative text in a single document (`.ipynb`). It’s widely used for data analysis, teaching, and rapid prototyping.
+
+- **Key Features**  
+  - Interactive code execution  
+  - Rich text via Markdown (headings, lists, LaTeX)  
+  - Inline data visualizations  
+  - Easy sharing and reproducibility  
+
+---
+
+## ⚙️ Installation & Access
+
+### 1. Install Locally
+
+You’ll need Python installed first. Then:
+
+```bash
+# Install Jupyter Notebook via pip
+pip install notebook
+```
+Or, if you use Conda:
+```bash
+conda install -c conda-forge notebook
+```
+After installation, launch the notebook server:
+```bash
+jupyter notebook
+```
+Your default browser will open at http://localhost:8888, showing the notebook dashboard.
+
+### 2. Use JupyterLab (Optional)
+For a more full-featured interface:
+
+```bash
+pip install jupyterlab
+jupyter lab
+```
+### 3. Cloud / Web Options
+Google Colab
+
+1. Go to colab.research.google.com
+2. Sign in with your Google account
+3. Open or upload any .ipynb file
+</details>
+
+
+
+<details>
+<summary> <b>🔥Data Handling with NumPy & Pandas🔥</b></summary>
+ Learn how to load, clean, and manipulate data using NumPy arrays and Pandas DataFrames.
+
+## 🔍 Library Overview
+
+Before we dive in, here's a quick intro to the two core libraries we’ll use:
+
+###  NumPy
+- **The fundamental package for numerical computing in Python.**
+- **Key features:**
+  - **Arrays:** Homogeneous, N-dimensional arrays (faster and more memory-efficient than Python lists)  
+  - **Vectorized ops:** Element-wise arithmetic without explicit loops  
+  - **Linear algebra & random:** Built-in support for matrix operations and pseudo-random number generation  
+
+###  Pandas
+- **A powerful data analysis and manipulation library built on top of NumPy.**  
+- **Key features:**
+  - **DataFrame:** 2D tabular data structure with labeled axes (rows & columns)  
+  - **IO tools:** Read/write CSV, Excel, SQL, JSON, and more  
+  - **Series:** 1D labeled array, great for time series and single-column tables  
+  - **Grouping & aggregation:** Split-apply-combine workflows for summarizing data  
+
+
+
+### 1. What  
+> **What you will learn in this section.**  
+> By the end of this notebook, you will be able to:  
+> - Create and manipulate NumPy arrays of different shapes and dtypes  
+> - Perform element-wise arithmetic and universal functions
+> - Index, slice, and reshape arrays for efficient computation  
+
+---
+
+### 2. Why  
+> **Why this topic matters.**  
+> NumPy arrays are the foundation of nearly all scientific computing in Python.  
+> They provide:  
+> - **Speed:** Vectorized operations run much faster than Python loops  
+> - **Memory efficiency:** Compact storage of homogeneous data  
+> - **Interoperability:** A common data structure for libraries like Pandas, SciPy, and scikit-learn  
+
+---
+
+### 3. How  
+> **How to do it.**  
+> Follow these step-by-step examples:
+
+```python
+import numpy as np
+
+# 1) Create arrays
+a = np.array([1, 2, 3, 4])
+b = np.arange(0, 10, 2)          # [0, 2, 4, 6, 8]
+c = np.zeros((2, 3), dtype=int)  # 2×3 array of zeros
+
+# 2) Element-wise arithmetic
+sum_ab = a + b[:4]               # adds element by element
+prod_ab = a * b[:4]              # multiplies element by element
+
+# 3) Universal functions
+sqrt_b = np.sqrt(b)              # square root of each element
+exp_a  = np.exp(a)               # eᵃ for each element
+
+# 4) Indexing & slicing
+row = b[2:5]                     # slice subarray
+c[0, :] = row                    # assign a row
+
+# 5) Reshape & combine
+d = np.linspace(0, 1, 6).reshape(2, 3)
+stacked = np.vstack([c, d])      # vertical stack of two 2×3 arrays
+
+
+```
+</details>
+
+
+
+<details>
+<summary> <b>🔥Basic Machine Learning with scikit-learn🔥</b></summary>
+Build your first regression and classification models, split data, and evaluate performance.
+
+## 🔍 Library Overview
+scikit-learn is one of the most widely used ML libraries in Python.  
+It provides simple APIs for preprocessing, training models, and evaluating performance.
+
+### ✨ Key Features
+- Large collection of supervised & unsupervised algorithms  
+- Easy dataset splitting, scaling, and pipelines  
+- Built-in metrics for evaluation  
+- Works seamlessly with NumPy & pandas  
+
+---
+
+### 1. What
+> **What you will learn in this section.**  
+> By the end of this notebook, you will be able to:  
+> - Split data into train/test sets  
+> - Train a simple regression model  
+> - Train a classification model  
+> - Evaluate predictions using accuracy and error metrics  
+
+---
+
+### 2. Why
+> **Why this topic matters.**  
+> - Machine Learning is the core of many data science projects.  
+> - scikit-learn offers a consistent interface to try many models quickly.  
+> - Understanding the ML workflow (split → train → predict → evaluate) is essential.  
+
+---
+
+### 3. How
+> **How to do it.**  
+> Follow these hands-on examples:
+
+```python
+from sklearn.datasets import load_iris, make_regression
+from sklearn.model_selection import train_test_split
+from sklearn.linear_model import LinearRegression, LogisticRegression
+from sklearn.metrics import mean_squared_error, accuracy_score
+import numpy as np
+
+# --- Regression Example ---
+# Generate synthetic data
+X_reg, y_reg = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)
+
+# Train/test split
+X_train, X_test, y_train, y_test = train_test_split(X_reg, y_reg, test_size=0.2, random_state=42)
+
+# Fit linear regression
+reg = LinearRegression()
+reg.fit(X_train, y_train)
+
+# Predict and evaluate
+y_pred = reg.predict(X_test)
+print("MSE (Regression):", mean_squared_error(y_test, y_pred))
+
+
+# --- Classification Example ---
+iris = load_iris()
+X_clf, y_clf = iris.data, iris.target
+
+X_train, X_test, y_train, y_test = train_test_split(X_clf, y_clf, test_size=0.2, random_state=42)
+
+clf = LogisticRegression(max_iter=200)
+clf.fit(X_train, y_train)
+
+y_pred = clf.predict(X_test)
+print("Accuracy (Classification):", accuracy_score(y_test, y_pred))
+
+```
+</details>
+
+
diff --git a/checkpoints/01_numpy_basics.ipynb b/checkpoints/01_numpy_basics.ipynb
new file mode 100644
index 0000000..d237f78
--- /dev/null
+++ b/checkpoints/01_numpy_basics.ipynb
@@ -0,0 +1,137 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "# ✅ Checkpoint 01 — NumPy Basics\n\n",
+        "**Goal**\n",
+        "- Create/reshape arrays, vectorized ops, boolean masking\n\n",
+        "**Rules**\n",
+        "- Fill only where marked as `# TODO`\n",
+        "- Do not change test cells (🔒)\n",
+        "- Run all cells before submitting\n\n",
+        "**References**\n",
+        "- NumPy docs: https://numpy.org/doc/\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# 🔧 Setup\n",
+        "import numpy as np\n",
+        "import pandas as pd\n",
+        "from utils.grader import check_array, check_value\n\n",
+        "np.random.seed(42)\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Q1) Create a 3x3 array with values 0..8 (row-major)\n",
+        "# TODO: assign to variable 'A'\n",
+        "A = ...  # TODO\n\n",
+        "# 🔒 Test\n",
+        "check_array(A, shape=(3,3), dtype=np.integer)\n",
+        "check_value(A.sum(), 36)\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Q2) From A, create a boolean mask selecting even numbers\n",
+        "# TODO: assign to variable 'mask_even'\n",
+        "mask_even = ...  # TODO\n\n",
+        "# 🔒 Test\n",
+        "check_array(mask_even, shape=(3,3), dtype=bool)\n",
+        "check_value(int(mask_even.sum()), 5)  # number of evens in 0..8\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Q3) Reshape, stack, and compute row-wise means → 'means'\n",
+        "# TODO: assign to variable 'means' (1D array length 3)\n",
+        "B = ...  # TODO\n",
+        "means = ...  # TODO\n\n",
+        "# 🔒 Test\n",
+        "check_array(means, shape=(3,))\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Q4) Broadcasting: A (3x3) and v (1x3) → 'C'\n",
+        "v = np.array([10, 0, -10])\n",
+        "C = ...  # TODO\n\n",
+        "# 🔒 Test\n",
+        "check_array(C, shape=(3,3), dtype=np.integer)\n",
+        "check_value(int(C[0,0] + C[2,2]), (A[0,0]+10) + (A[2,2]-10))\n"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {},
+      "outputs": [],
+      "source": [
+        "# Q5) Fancy indexing / boolean masking\n",
+        "# Extract odd numbers ≥ 3 from A → 'odd_ge3'\n",
+        "odd_ge3 = ...  # TODO\n\n",
+        "# 🔒 Test\n",
+        "check_array(\n",
+        "    odd_ge3,\n",
+        "    shape=(np.count_nonzero((A>=3)&(A%2==1)),),\n",
+        "    dtype=np.integer,\n",
+        "    allow_int_any=True\n",
+        ")\n",
+        "check_value(int(odd_ge3.min()), 3)\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "### ✅ Submit\n",
+        "- All tests above passed\n",
+        "- Save notebook and commit to your repo\n"
+      ]
+    }
+  ],
+  "metadata": {
+    "kernelspec": {
+      "display_name": "Python 3",
+      "language": "python",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.11.8",
+      "mimetype": "text/x-python",
+      "codemirror_mode": {
+        "name": "ipython",
+        "version": 3
+      },
+      "pygments_lexer": "ipython3",
+      "nbconvert_exporter": "python",
+      "file_extension": ".py"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/checkpoints/02_pandas_basics.ipynb b/checkpoints/02_pandas_basics.ipynb
new file mode 100644
index 0000000..839870c
--- /dev/null
+++ b/checkpoints/02_pandas_basics.ipynb
@@ -0,0 +1,167 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# ✅ Checkpoint 02 — pandas Basics\n",
+    "\n",
+    "**Goal**\n",
+    "- Load/create DataFrames, filter & sort, add computed columns, groupby/aggregate, and merge.\n",
+    "\n",
+    "**Rules**\n",
+    "- Fill only where marked as `# TODO`\n",
+    "- Do not change test cells (🔒)\n",
+    "- Run all cells before submitting\n",
+    "\n",
+    "**References**\n",
+    "- pandas docs: https://pandas.pydata.org/docs/\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 🔧 Setup\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "from utils.grader import (\n",
+    "    check_array, check_value, check_dataframe_columns,\n",
+    "    check_series_index_values, check_len\n",
+    ")\n",
+    "np.random.seed(42)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Small in-memory data we'll use throughout\n",
+    "data = {\n",
+    "    'city': ['Ann Arbor','Kalamazoo','Detroit','Grand Rapids','Lansing'],\n",
+    "    'temp_f': [68, 77, 59, 90, 82],\n",
+    "    'rain': [False, True, False, False, True],\n",
+    "    'date': pd.to_datetime(['2025-08-20','2025-08-20','2025-08-20','2025-08-20','2025-08-20'])\n",
+    "}\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Q1) Create a DataFrame 'df' from the dict 'data' with columns in order: city, temp_f, rain, date\n",
+    "# TODO: assign to variable 'df'\n",
+    "df = ...  # TODO\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_dataframe_columns(df, ['city','temp_f','rain','date'])\n",
+    "check_value(df.iloc[0]['city'], 'Ann Arbor')\n",
+    "check_len(df, 5)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Q2) Filter rows where rain == False, sort by temp_f descending, reset index → 'df_dry'\n",
+    "# TODO: assign to variable 'df_dry'\n",
+    "df_dry = ...  # TODO\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_len(df_dry, 3)\n",
+    "check_value(df_dry.iloc[0]['temp_f'], 90)\n",
+    "check_dataframe_columns(df_dry, ['city','temp_f','rain','date'])\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Q3) Add a Celsius column: temp_c = round((temp_f - 32) * 5/9, 1)\n",
+    "# TODO: create 'temp_c' column on df\n",
+    "...\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_value(float(df.loc[df['city']=='Grand Rapids','temp_c'].iloc[0]), round((90-32)*5/9,1))\n",
+    "check_dataframe_columns(df, ['city','temp_f','rain','date','temp_c'])\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Q4) Group by 'rain' and compute mean temp_c → 'avg_temp_by_rain' (Series indexed by rain boolean)\n",
+    "# TODO: assign to variable 'avg_temp_by_rain'\n",
+    "avg_temp_by_rain = ...  # TODO\n",
+    "\n",
+    "# 🔒 Test (values checked approximately)\n",
+    "check_series_index_values(avg_temp_by_rain, {False, True})\n",
+    "mean_false = avg_temp_by_rain.loc[False]\n",
+    "mean_true = avg_temp_by_rain.loc[True]\n",
+    "check_value(round(float(mean_false),1), round(((68-32)*5/9 + (59-32)*5/9 + (90-32)*5/9)/3, 1))\n",
+    "check_value(round(float(mean_true),1), round(((77-32)*5/9 + (82-32)*5/9)/2, 1))\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Q5) Merge: create a DataFrame 'city_region' with columns city and region, then left-merge onto df → 'df_merged'\n",
+    "city_region = pd.DataFrame({\n",
+    "    'city': ['Ann Arbor','Kalamazoo','Detroit','Grand Rapids','Lansing'],\n",
+    "    'region': ['SE','SW','SE','W','C']\n",
+    "})\n",
+    "# TODO: left-merge on 'city' to produce df_merged\n",
+    "df_merged = ...  # TODO\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_dataframe_columns(df_merged, ['city','temp_f','rain','date','temp_c','region'])\n",
+    "check_value(set(df_merged['region']), {'SE','SW','W','C'})\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### ✅ Submit\n",
+    "- All tests above passed\n",
+    "- Save notebook and commit to your repo\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11",
+   "mimetype": "text/x-python",
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "pygments_lexer": "ipython3",
+   "nbconvert_exporter": "python",
+   "file_extension": ".py"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/checkpoints/03_matplotlib_seaborn.ipynb b/checkpoints/03_matplotlib_seaborn.ipynb
new file mode 100644
index 0000000..d87d586
--- /dev/null
+++ b/checkpoints/03_matplotlib_seaborn.ipynb
@@ -0,0 +1,237 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# ✅ Checkpoint 03 — Matplotlib & Seaborn\n",
+    "\n",
+    "**Goal**\n",
+    "- Create basic plots with Matplotlib & Seaborn: scatter, histogram, boxplot, and aggregated barplot.\n",
+    "- Set titles/labels properly and export figures as files.\n",
+    "\n",
+    "**Rules**\n",
+    "- Fill only where marked as `# TODO`\n",
+    "- Do not change test cells (🔒)\n",
+    "- Run all cells in order before submitting\n",
+    "\n",
+    "**References**\n",
+    "- Matplotlib docs: https://matplotlib.org/stable/\n",
+    "- Seaborn docs: https://seaborn.pydata.org/\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 🔧 Setup\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import matplotlib.pyplot as plt\n",
+    "import seaborn as sns\n",
+    "from utils.grader import (\n",
+    "    check_value, check_len, check_file_exists,\n",
+    "    check_axes_instance, check_xlabel, check_ylabel, check_title_contains,\n",
+    "    check_num_lines, check_num_collections, check_num_patches\n",
+    ")\n",
+    "np.random.seed(42)\n",
+    "\n",
+    "# ensure output dir\nn",
+    "os.makedirs('outputs', exist_ok=True)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# small synthetic dataset (deterministic)\n",
+    "n = 120\n",
+    "days = np.random.choice(['Thur','Fri','Sat','Sun'], size=n, p=[0.25,0.2,0.3,0.25])\n",
+    "sex = np.random.choice(['Male','Female'], size=n)\n",
+    "smoker = np.random.choice(['Yes','No'], size=n, p=[0.3,0.7])\n",
+    "total_bill = np.round(np.random.normal(loc=24, scale=8, size=n).clip(5, 80), 2)\n",
+    "tip = np.round((total_bill * np.random.uniform(0.08, 0.22, size=n)), 2)\n",
+    "\n",
+    "df = pd.DataFrame({\n",
+    "    'day': days,\n",
+    "    'sex': sex,\n",
+    "    'smoker': smoker,\n",
+    "    'total_bill': total_bill,\n",
+    "    'tip': tip\n",
+    "})\n",
+    "df.head()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q1) Matplotlib Scatter\n",
+    "Create a scatter plot of `total_bill` (x) vs `tip` (y) using **Matplotlib**.\n",
+    "- Put the **x label**: `Total Bill ($)`\n",
+    "- Put the **y label**: `Tip ($)`\n",
+    "- Title should contain the word **\"Scatter\"**\n",
+    "- Save the fig object in a variable named **`fig1`**, axes in **`ax1`**\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: create fig1, ax1, draw scatter, set labels and title\n",
+    "fig1, ax1 = ...  # TODO\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_axes_instance(ax1)\n",
+    "check_xlabel(ax1, 'Total Bill ($)')\n",
+    "check_ylabel(ax1, 'Tip ($)')\n",
+    "check_title_contains(ax1, 'Scatter')\n",
+    "check_num_collections(ax1, 1)  # one scatter collection\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q2) Seaborn Boxplot\n",
+    "Using **Seaborn**, create a **boxplot** of `tip` by `day` (x=`day`, y=`tip`).\n",
+    "- Store the Axes in a variable named **`ax2`**\n",
+    "- x label must be `Day`, y label must be `Tip ($)`\n",
+    "- Title should contain the word **\"Box\"**\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: create ax2 using seaborn.boxplot\n",
+    "ax2 = ...  # TODO\n",
+    "...\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_axes_instance(ax2)\n",
+    "check_xlabel(ax2, 'Day')\n",
+    "check_ylabel(ax2, 'Tip ($)')\n",
+    "check_title_contains(ax2, 'Box')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q3) Matplotlib Histogram\n",
+    "Create a **histogram** of `total_bill` with **10 bins** using Matplotlib.\n",
+    "- Save fig as **`fig3`**, axes as **`ax3`**\n",
+    "- Title should contain **\"Histogram\"**\n",
+    "- x label `Total Bill ($)`\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: histogram with 10 bins\n",
+    "fig3, ax3 = ...  # TODO\n",
+    "...\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_axes_instance(ax3)\n",
+    "check_title_contains(ax3, 'Histogram')\n",
+    "check_xlabel(ax3, 'Total Bill ($)')\n",
+    "check_num_patches(ax3, 10)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q4) Seaborn Aggregated Barplot\n",
+    "Add a computed column `tip_pct = tip / total_bill * 100`. Then plot the **mean tip % by day** using Seaborn (barplot).\n",
+    "- Store the Axes in **`ax4`**\n",
+    "- There should be one bar per unique day in `df['day']`\n",
+    "- y label should contain the `%` sign\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: add tip_pct column and make barplot of mean tip_pct by day\n",
+    "...\n",
+    "ax4 = ...  # TODO\n",
+    "...\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_axes_instance(ax4)\n",
+    "unique_days = sorted(df['day'].unique().tolist())\n",
+    "check_len(ax4.patches, len(unique_days))\n",
+    "check_ylabel(ax4, '%')  # contains percent sign\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q5) Save Figure to File\n",
+    "Save the Q1 scatter figure to `outputs/fig_scatter.png` using `fig1.savefig(...)`.\n",
+    "- The path must be exactly `outputs/fig_scatter.png`\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: save fig1 to outputs/fig_scatter.png\n",
+    "...\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_file_exists('outputs/fig_scatter.png')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### ✅ Submit\n",
+    "- All tests above passed\n",
+    "- Save notebook and commit to your repo\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11",
+   "mimetype": "text/x-python",
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "pygments_lexer": "ipython3",
+   "nbconvert_exporter": "python",
+   "file_extension": ".py"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/checkpoints/04_plotly_intro.ipynb b/checkpoints/04_plotly_intro.ipynb
new file mode 100644
index 0000000..6912439
--- /dev/null
+++ b/checkpoints/04_plotly_intro.ipynb
@@ -0,0 +1,237 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# ✅ Checkpoint 04 — Plotly Intro\n",
+    "\n",
+    "**Goal**\n",
+    "- Build interactive charts with Plotly (scatter, histogram, bar) using both Express and Graph Objects.\n",
+    "- Set titles/axis labels, count traces, and export figures to HTML.\n",
+    "\n",
+    "**Rules**\n",
+    "- Fill only where marked as `# TODO`.\n",
+    "- Do not change test cells (🔒).\n",
+    "- Run all cells in order before submitting.\n",
+    "\n",
+    "**References**\n",
+    "- Plotly docs: https://plotly.com/python/\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 🔧 Setup\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import plotly.express as px\n",
+    "import plotly.graph_objects as go\n",
+    "from utils.grader import (\n",
+    "    check_file_exists,\n",
+    "    check_figure, check_trace_count,\n",
+    "    check_axis_title, check_layout_title_contains,\n",
+    "    check_bar_count, check_trace_modes\n",
+    ")\n",
+    "np.random.seed(42)\n",
+    "os.makedirs('outputs', exist_ok=True)\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Small deterministic dataset (similar to 'tips')\n",
+    "n = 120\n",
+    "days = np.random.choice(['Thur','Fri','Sat','Sun'], size=n, p=[0.25,0.2,0.3,0.25])\n",
+    "sex = np.random.choice(['Male','Female'], size=n)\n",
+    "smoker = np.random.choice(['Yes','No'], size=n, p=[0.3,0.7])\n",
+    "total_bill = np.round(np.random.normal(loc=24, scale=8, size=n).clip(5, 80), 2)\n",
+    "tip = np.round((total_bill * np.random.uniform(0.08, 0.22, size=n)), 2)\n",
+    "df = pd.DataFrame({\n",
+    "    'day': days,\n",
+    "    'sex': sex,\n",
+    "    'smoker': smoker,\n",
+    "    'total_bill': total_bill,\n",
+    "    'tip': tip\n",
+    "})\n",
+    "df.head()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q1) Plotly Express — Scatter\n",
+    "Create a scatter plot of `total_bill` (x) vs `tip` (y) using **Plotly Express**.\n",
+    "- Color by `day` (optional but encouraged).\n",
+    "- Title should contain **\"Scatter\"**.\n",
+    "- x-axis title: `Total Bill ($)`; y-axis title: `Tip ($)`.\n",
+    "- Store the figure in **`fig1`**.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: create fig1 with px.scatter\n",
+    "fig1 = ...  # TODO\n",
+    "# Example (for reference):\n",
+    "# fig1 = px.scatter(df, x='total_bill', y='tip', color='day', title='Scatter: Tip vs Total Bill')\n",
+    "# fig1.update_layout(xaxis_title='Total Bill ($)', yaxis_title='Tip ($)')\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_figure(fig1)\n",
+    "check_trace_count(fig1, expected_min=1)  # at least 1 trace (color may create >1)\n",
+    "check_layout_title_contains(fig1, 'Scatter')\n",
+    "check_axis_title(fig1, axis='x', expected='Total Bill ($)')\n",
+    "check_axis_title(fig1, axis='y', expected='Tip ($)')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q2) Plotly Express — Histogram\n",
+    "Create a histogram of `total_bill` with **10 bins**.\n",
+    "- Title should contain **\"Histogram\"**.\n",
+    "- x-axis title: `Total Bill ($)`.\n",
+    "- Store the figure in **`fig2`**.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: create fig2 with px.histogram and nbins=10\n",
+    "fig2 = ...  # TODO\n",
+    "# Example:\n",
+    "# fig2 = px.histogram(df, x='total_bill', nbins=10, title='Histogram: Total Bill')\n",
+    "# fig2.update_layout(xaxis_title='Total Bill ($)')\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_figure(fig2)\n",
+    "check_trace_count(fig2, expected_min=1)\n",
+    "check_layout_title_contains(fig2, 'Histogram')\n",
+    "check_axis_title(fig2, axis='x', expected='Total Bill ($)')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q3) Plotly Express — Bar (mean tip%)\n",
+    "Add a computed column `tip_pct = tip / total_bill * 100`. Then plot the **mean tip % by day** as a bar chart.\n",
+    "- One bar per unique `day`.\n",
+    "- y-axis title should contain `%`.\n",
+    "- Store the figure in **`fig3`**.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: compute tip_pct and create fig3\n",
+    "...\n",
+    "fig3 = ...  # TODO\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_figure(fig3)\n",
+    "unique_days = sorted(df['day'].unique().tolist())\n",
+    "check_bar_count(fig3, expected=len(unique_days))\n",
+    "check_axis_title(fig3, axis='y', expected='%')  # contains percent sign\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q4) Graph Objects — Line (running mean of tip)\n",
+    "Using **plotly.graph_objects**, build a line chart of the running mean of `tip` over row index.\n",
+    "- Use `go.Figure` with a single `go.Scatter` trace in `'lines'` mode.\n",
+    "- Title should contain **\"Running Mean\"**.\n",
+    "- Store the figure in **`fig4`**.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: create running mean and fig4 with go.Figure\n",
+    "...\n",
+    "fig4 = ...  # TODO\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_figure(fig4)\n",
+    "check_trace_count(fig4, expected_min=1, expected_max=1)\n",
+    "check_trace_modes(fig4, must_include='lines')\n",
+    "check_layout_title_contains(fig4, 'Running Mean')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Q5) Export to HTML\n",
+    "Save the Q1 scatter figure to **`outputs/fig_scatter.html`** using `fig1.write_html(...)`.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# TODO: export fig1 to outputs/fig_scatter.html\n",
+    "...\n",
+    "\n",
+    "# 🔒 Test\n",
+    "check_file_exists('outputs/fig_scatter.html')\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### ✅ Submit\n",
+    "- All tests above passed\n",
+    "- Save notebook and commit to your repo\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.11",
+   "mimetype": "text/x-python",
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "pygments_lexer": "ipython3",
+   "nbconvert_exporter": "python",
+   "file_extension": ".py"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/checkpoints/utils/grader.py b/checkpoints/utils/grader.py
new file mode 100644
index 0000000..4ac8796
--- /dev/null
+++ b/checkpoints/utils/grader.py
@@ -0,0 +1,168 @@
+# utils/grader.py
+import os
+import numpy as np
+import pandas as pd
+import matplotlib
+import matplotlib.pyplot as plt
+
+def _fail(msg):
+    raise AssertionError(msg)
+
+# Generic / NumPy / pandas
+def check_array(arr, shape=None, dtype=None, allow_int_any=False):
+    if not isinstance(arr, np.ndarray):
+        _fail(f"❌ Expected numpy.ndarray, got {type(arr)}")
+    if shape is not None and arr.shape != shape:
+        _fail(f"❌ Wrong shape: expected {shape}, got {arr.shape}")
+    if dtype is not None:
+        if allow_int_any and np.issubdtype(arr.dtype, np.integer):
+            pass
+        elif not np.issubdtype(arr.dtype, dtype):
+            _fail(f"❌ Wrong dtype: expected {dtype}, got {arr.dtype}")
+    print("✅ Array check passed.")
+
+def check_value(val, expected, tol=1e-8):
+    if isinstance(val, (float, np.floating)) or isinstance(expected, (float, np.floating)):
+        if abs(float(val) - float(expected)) > tol:
+            _fail(f"❌ Wrong value: expected {expected}, got {val}")
+    else:
+        if val != expected:
+            _fail(f"❌ Wrong value: expected {expected}, got {val}")
+    print("✅ Value check passed.")
+
+def check_dataframe_columns(df, expected_cols):
+    if not isinstance(df, pd.DataFrame):
+        _fail(f"❌ Expected pandas.DataFrame, got {type(df)}")
+    missing = [c for c in expected_cols if c not in df.columns]
+    if missing:
+        _fail(f"❌ Missing columns: {missing}")
+    print("✅ DataFrame columns check passed.")
+
+def check_series_index_values(s, expected_index_set):
+    if not isinstance(s, pd.Series):
+        _fail(f"❌ Expected pandas.Series, got {type(s)}")
+    if set(list(s.index)) != set(list(expected_index_set)):
+        _fail(f"❌ Unexpected index: got {list(s.index)}, expected set {list(expected_index_set)}")
+    print("✅ Series index check passed.")
+
+def check_len(obj, expected_len):
+    try:
+        n = len(obj)
+    except Exception as e:
+        _fail(f"❌ Object has no len(): {e}")
+    if n != expected_len:
+        _fail(f"❌ Wrong length: expected {expected_len}, got {n}")
+    print("✅ Length check passed.")
+
+def check_file_exists(path):
+    if not os.path.exists(path):
+        _fail(f"❌ File not found: {path}")
+    print("✅ File exists.")
+
+# Matplotlib / Seaborn helpers
+def check_axes_instance(ax):
+    if not hasattr(ax, "get_xlabel") or not hasattr(ax, "get_ylabel"):
+        _fail(f"❌ Expected a Matplotlib Axes-like object, got {type(ax)}")
+    print("✅ Axes instance check passed.")
+
+def check_xlabel(ax, expected):
+    label = ax.get_xlabel()
+    if label != expected and expected not in label:
+        _fail(f"❌ X label mismatch. Got '{label}', expected '{expected}' (or containing it).")
+    print("✅ X label ok.")
+
+def check_ylabel(ax, expected):
+    label = ax.get_ylabel()
+    if label != expected and expected not in label:
+        _fail(f"❌ Y label mismatch. Got '{label}', expected '{expected}' (or containing it).")
+    print("✅ Y label ok.")
+
+def check_title_contains(ax, keyword):
+    title = ax.get_title()
+    if keyword not in title:
+        _fail(f"❌ Title does not contain '{keyword}'. Got '{title}'")
+    print("✅ Title contains keyword.")
+
+def check_num_lines(ax, expected_n):
+    n = len(ax.lines)
+    if n != expected_n:
+        _fail(f"❌ Expected {expected_n} line(s), got {n}")
+    print("✅ Number of lines ok.")
+
+def check_num_collections(ax, expected_n):
+    n = len(ax.collections)
+    if n != expected_n:
+        _fail(f"❌ Expected {expected_n} collection(s), got {n}")
+    print("✅ Number of collections ok.")
+
+def check_num_patches(ax, expected_n):
+    n = len(ax.patches)
+    if n != expected_n:
+        _fail(f"❌ Expected {expected_n} patch(es), got {n}")
+    print("✅ Number of patches ok.")
+
+# Plotly helpers
+def check_figure(fig):
+    try:
+        import plotly.graph_objects as go
+    except Exception as e:
+        _fail(f"❌ Plotly not installed: {e}")
+    if not isinstance(fig, go.Figure):
+        _fail(f"❌ Expected plotly.graph_objects.Figure, got {type(fig)}")
+    print("✅ Figure instance ok.")
+
+def check_trace_count(fig, expected_min=None, expected_max=None):
+    n = len(fig.data)
+    if expected_min is not None and n < expected_min:
+        _fail(f"❌ Too few traces: got {n}, expected >= {expected_min}")
+    if expected_max is not None and n > expected_max:
+        _fail(f"❌ Too many traces: got {n}, expected <= {expected_max}")
+    print("✅ Trace count ok.")
+
+def _get_axis(fig, axis):
+    if axis == 'x':
+        return fig.layout.xaxis
+    elif axis == 'y':
+        return fig.layout.yaxis
+    else:
+        _fail("❌ axis must be 'x' or 'y'")
+
+def check_axis_title(fig, axis='x', expected=None):
+    ax = _get_axis(fig, axis)
+    title = getattr(ax.title, "text", "") if ax.title else ""
+    if expected is None:
+        _fail("❌ expected title text is None")
+    if expected != title and (expected not in title):
+        _fail(f"❌ {axis}-axis title mismatch. Got '{title}', expected '{expected}' (or containing it).")
+    print(f"✅ {axis.upper()} axis title ok.")
+
+def check_layout_title_contains(fig, keyword):
+    title = getattr(fig.layout.title, "text", "") if fig.layout.title else ""
+    if keyword not in title:
+        _fail(f"❌ Layout title does not contain '{keyword}'. Got '{title}'")
+    print("✅ Layout title contains keyword.")
+
+def check_bar_count(fig, expected):
+    if len(fig.data) == 0:
+        _fail("❌ No traces in figure.")
+    trace = fig.data[0]
+    xs = getattr(trace, "x", None)
+    if xs is None:
+        _fail("❌ Bar trace has no x values.")
+    n = len(xs)
+    if n != expected:
+        _fail(f"❌ Expected {expected} bars, got {n}")
+    print("✅ Bar count ok.")
+
+def check_trace_modes(fig, must_include='lines'):
+    if len(fig.data) == 0:
+        _fail("❌ No traces in figure.")
+    modes = []
+    for t in fig.data:
+        mode = getattr(t, "mode", None)
+        if mode:
+            modes.append(mode)
+    joined = ",".join(modes)
+    if must_include not in joined:
+        _fail(f"❌ Required mode '{must_include}' not found in traces. Got modes: {modes}")
+    print("✅ Trace mode ok.")
diff --git a/libraries.md b/libraries.md
new file mode 100644
index 0000000..6744b24
--- /dev/null
+++ b/libraries.md
@@ -0,0 +1,51 @@
+# 📚 Top 26 Python Libraries for Data Science
+
+
+
+## Staple Python Libraries for Data Science
+1. **NumPy** – Core numerical computing library in Python, offering fast operations on multi-dimensional arrays and matrices, essential for scientific computing and linear algebra.  
+2. **pandas** – Powerful data analysis/manipulation tool providing DataFrame structures, easy I/O with multiple file formats, and advanced indexing, grouping, and time series functionality.  
+3. **Matplotlib** – Fundamental plotting library for creating static, interactive, and animated visualizations with full customization.  
+4. **Seaborn** – High-level statistical visualization library built on Matplotlib, offering attractive and informative default styles for complex plots.  
+5. **Plotly** – Interactive graphing library for web-based visualizations, supporting 3D charts and dashboards via Dash.  
+6. **scikit-learn** – Comprehensive machine learning library for classification, regression, clustering, and preprocessing, with a consistent API.
+
+
+<br>
+
+## Machine Learning Python Libraries
+7. **LightGBM** – Gradient boosting framework optimized for speed, memory efficiency, and accuracy, supporting large-scale and GPU-based learning.  
+8. **XGBoost** – Widely used gradient boosting library known for performance in Kaggle competitions, supporting distributed training and multiple platforms.  
+9. **CatBoost** – High-performance gradient boosting library with strong categorical feature handling and excellent CPU/GPU support.  
+10. **Statsmodels** – Statistical modeling library for regression, hypothesis testing, and time series analysis, with an R-like interface.  
+11. **RAPIDS cuDF/cuML** – NVIDIA GPU-accelerated libraries for DataFrame manipulation and machine learning with pandas- and scikit-learn-like APIs.  
+12. **Optuna** – Hyperparameter optimization framework with efficient algorithms, pruning, and visualization tools.
+
+<br>
+
+
+## Automated Machine Learning Python Libraries
+13. **PyCaret** – Low-code machine learning library automating the end-to-end ML workflow for rapid experimentation.  
+14. **H2O** – Scalable ML platform for big data, supporting distributed computing and AutoML.  
+15. **TPOT** – AutoML tool using genetic programming to optimize ML pipelines automatically.  
+16. **auto-sklearn** – Automated model selection and hyperparameter tuning built on scikit-learn with Bayesian optimization.  
+17. **FLAML** – Lightweight AutoML library focused on finding accurate models quickly with minimal computational cost.
+
+<br>
+
+
+## Deep Learning Python Libraries
+18. **TensorFlow** – Google’s open-source ML framework for scalable deep learning, offering APIs for building, training, and deploying models.  
+19. **PyTorch** – Facebook’s deep learning framework known for dynamic computation graphs, ease of use, and strong research-to-production transition.  
+20. **FastAI** – High-level deep learning library on PyTorch with concise APIs for state-of-the-art results.  
+21. **Keras** – User-friendly deep learning API integrated with TensorFlow, designed for quick prototyping and experimentation.  
+22. **PyTorch Lightning** – Lightweight wrapper for PyTorch that organizes code for reproducibility and scalability.
+
+
+<br>
+
+## Natural Language Processing Python Libraries
+23. **NLTK** – Comprehensive NLP toolkit for tokenization, parsing, and linguistic processing with access to corpora like WordNet.  
+24. **spaCy** – Industrial-strength NLP library for large-scale text processing, supporting deep learning integration and 60+ languages.  
+25. **Gensim** – Topic modeling and vector space modeling library optimized for large corpora and memory efficiency.  
+26. **Hugging Face Transformers** – Library for state-of-the-art transformer-based models for text, vision, audio, and multimodal tasks, supporting PyTorch, TensorFlow, and JAX.