📘 Google Ads Support Optimization — ML + LLM Hybrid System
A Data Science + LLM project simulating how Google’s gTech teams optimize Ads customer operations
🚀 Overview
This project builds an end-to-end support ticket optimization pipeline inspired by real workflows inside Google Ads / gTech.
It demonstrates how machine learning, LLMs, and operational analytics can be combined to:
classify support ticket severity
extract semantic tags using lightweight LLM prompts
compute a priority score blending ML + LLM + business impact
enable smarter ticket triage, routing, and escalation
The goal is to show how data science and LLMs can directly improve customer support outcomes at scale — aligning with responsibilities in Google’s Business Data Science (gDATA) and BizOps roles.
🧠 Project Architecture
+------------------+
| Raw Support Data |
+------------------+
|
v
+------------------+
| EDA + Cleaning |
+------------------+
|
v
+-------------------------------+
| ML Severity Classifier |
| (TF-IDF + Logistic Regression)|
+-------------------------------+
|
v
+-------------------------------+
| LLM Issue Tagger (Groq LLaMA) |
| - topic tags |
| - urgency estimation |
| - concise summarization |
+-------------------------------+
|
v
+-------------------------------+
| Priority Score Engine |
| ML severity + LLM tags + |
| revenue impact (optional) |
+-------------------------------+
|
v
+------------------+
| Ranked Tickets |
+------------------+
🔍 1. Exploratory Data Analysis (EDA)
The dataset includes synthetic support tickets with:
ticket text
customer metadata (region, spend, segment)
sentiment + escalation info
timestamps
labels for severity
Notebooks provide:
✔ distribution plots
✔ correlations
✔ text length analysis
✔ severity imbalance checks
✔ baseline exploratory insights
🤖 2. ML Severity Classifier
A supervised ML model predicts ticket severity levels (e.g., low, medium, high).
Key components:
TF-IDF vectorizer
Logistic Regression classifier
Pipeline stored in models/severity_classifier.pkl
Why this matters
Severity classification is the first triage step used by real gTech teams.
🧩 3. LLM Issue Module (Groq LLaMA-3.1)
A lightweight NLP layer extracts semantics from the ticket text using Groq’s high-speed inference.
LLM outputs:
A. Category classification
(Policy, Billing, Performance, Tracking, Access, etc.)
B. 1–2 sentence summary for agents
C. JSON semantic tags
{
"billing_related": false,
"policy_related": true,
"performance_related": true,
"access_security_related": false,
"tracking_related": false,
"urgency_hint": "high"
}
This creates rich contextual metadata that ML alone cannot capture.
🔥 4. Priority Scoring Engine (ML + LLM Hybrid)
The engine fuses multiple signals:
Component Description
ML Severity 0–1–2 or low/med/high (baseline urgency)
LLM Urgency low / medium / high
LLM Semantic Tags topic-based risk cues
Revenue Impact optional financial weighting
Example output:
{
"priority_score": 82.5,
"components": {
"severity_weight": 0.60,
"llm_urgency_weight": 0.20,
"llm_topic_weights": 0.15,
"revenue_weight": 0.05
}
}
Why this matters
This mirrors the real multi-signal decision logic in enterprise support systems — ensuring the right agent sees the right issue at the right time.
📊 5. Visualizations
The notebooks include:
📈 Severity Distribution
📊 Priority Score Histogram
🔥 Correlation Heatmap (LLM tags vs. Priority)
🎯 Scatter Plot (Severity vs Priority)
🧩 Component Breakdown Bar Chart
These give clear evidence of system behavior and interpretability.
📂 Repository Structure
google_ads_support_optimization/
│
├── data/
│ └── raw_support_tickets.xlsx
│
├── models/
│ └── severity_classifier.pkl
│
├── src/
│ ├── config.py
│ ├── llm_client.py
│ ├── llm_issue_module.py
│ └── priority_engine.py
│
├── notebooks/
│ ├── 01_EDA.ipynb
│ ├── 02_severity_classifier.ipynb
│ └── 03_priority_engine.ipynb
│
├── .env
├── requirements.txt
└── README.md
🛠️ Installation
1. Clone the repo
git clone https://github.com//google_ads_support_optimization.git
cd google_ads_support_optimization
2. Create and activate virtual environment
python -m venv .venv
.\.venv\Scripts\activate
3. Install dependencies
pip install -r requirements.txt
4. Set up .env
GROQ_API_KEY=your_key_here
LLM_PROVIDER=groq
LLM_MODEL=llama-3.1-8b-instant
🧪 Run the notebooks
jupyter notebook
🎯 Why This Project Is Relevant to Google
This project demonstrates capability in:
✔ Data analysis & statistical modeling
✔ ML model development
✔ LLM integration + prompt engineering
✔ Operations optimization
✔ Cross-functional communication (summaries, explainability)
✔ Building production-ready pipelines
✔ Prioritizing high-impact business problems
It directly aligns with responsibilities in:
gTech Business Data Science (gBDS)
Business Data Scientist (gDATA)
Google Ads Strategy & Operations
BizOps / Product Operations
AI/LLM-enabled support analytics
🧵 Future Extensions
Routing classifier for assigning agent group
Real-time API for scoring new tickets
Streamlit dashboard for interactive triage
Integration with BigQuery or Vertex AI
Multi-label classification for richer taxonomy