Skip to content

Commit ae2788d

Browse files
committed
Add solution and explanation for problem 3764: Most Common Course Pairs
1 parent b7026d0 commit ae2788d

File tree

3 files changed

+185
-0
lines changed

3 files changed

+185
-0
lines changed

data/leetcode-problems.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26339,5 +26339,12 @@
2633926339
"title": "Maximum Substrings With Distinct Start",
2634026340
"difficulty": "Medium",
2634126341
"link": "https://leetcode.com/problems/maximum-substrings-with-distinct-start/"
26342+
},
26343+
"3764": {
26344+
"id": 3764,
26345+
"category": "Database",
26346+
"title": "Most Common Course Pairs",
26347+
"difficulty": "Hard",
26348+
"link": "https://leetcode.com/problems/most-common-course-pairs/"
2634226349
}
2634326350
}

explanations/3764/en.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
## Explanation
2+
3+
### Strategy
4+
5+
**Restate the problem**
6+
7+
We need to analyze course completion data to find the most common learning pathways among top-performing students. A "pathway" is a sequence of two consecutive courses that a student completed. We only consider students who completed at least 5 courses with an average rating of 4 or higher.
8+
9+
**1.1 Constraints & Complexity**
10+
11+
- **Input Size:** The `course_completions` table can have N rows, where each row represents one course completion by a user.
12+
- **Time Complexity:** O(N log N) - We need to:
13+
- Filter top performers: O(N) for grouping and aggregation
14+
- Order courses chronologically: O(N log N) for window function sorting
15+
- Join to create pairs: O(N) for the self-join
16+
- Group and count pairs: O(N) for aggregation
17+
- Final sorting: O(P log P) where P is the number of unique pairs (typically much smaller than N)
18+
- Overall: O(N log N) dominated by the sorting step
19+
- **Space Complexity:** O(N) - We store intermediate results in CTEs (top performers list, ordered courses, and course pairs)
20+
- **Edge Case:** If no students meet the top performer criteria (at least 5 courses with average rating >= 4), the result will be an empty table.
21+
22+
**1.2 High-level approach**
23+
24+
The goal is to identify which course transitions are most popular among high-achieving students. We break this into three main steps: first, filter to only top performers; second, order each student's courses by completion date; third, extract consecutive pairs and count their frequencies.
25+
26+
![Course pathway visualization showing students progressing through courses with arrows indicating transitions]
27+
28+
**1.3 Brute force vs. optimized strategy**
29+
30+
- **Brute Force:** For each student, check if they qualify as a top performer. Then, for each qualifying student, manually extract all consecutive course pairs by comparing every course with every other course. This would require nested loops and result in O(N²) time complexity.
31+
- **Optimized Strategy:** Use SQL window functions (ROW_NUMBER) to efficiently order courses chronologically, then use a self-join to create consecutive pairs in a single pass. This leverages SQL's optimized join and aggregation operations, resulting in O(N log N) time complexity.
32+
- **Emphasize the optimization:** By using window functions and structured CTEs, we let the database engine handle sorting and joining efficiently, avoiding manual iteration and reducing both code complexity and execution time.
33+
34+
**1.4 Decomposition**
35+
36+
1. **Identify Top Performers:** Group students by user_id, count their courses, and calculate average rating. Filter to only those with at least 5 courses and average rating >= 4.
37+
2. **Order Courses Chronologically:** For each top performer, assign a sequential number to their courses based on completion_date using a window function.
38+
3. **Create Consecutive Pairs:** Join the ordered courses table with itself, matching each course to the next course in sequence (where the order number differs by exactly 1).
39+
4. **Count Pair Frequencies:** Group the pairs by first_course and second_course, counting how many times each transition occurs.
40+
5. **Sort Results:** Order by transition_count descending, then by course names ascending.
41+
42+
### Steps
43+
44+
**2.1 Initialization & Example Setup**
45+
46+
Let's use the example data from the problem:
47+
48+
```
49+
User 1: Python Basics → SQL Fundamentals → JavaScript → React Basics → Node.js → Docker
50+
User 2: Python Basics → React Basics → Node.js → Docker → AWS Fundamentals
51+
User 3: Python Basics → SQL Fundamentals → JavaScript → React Basics → Node.js (doesn't qualify - avg rating 2.8)
52+
User 4: Python Basics → Data Science → Machine Learning (doesn't qualify - only 3 courses)
53+
```
54+
55+
After filtering to top performers (Users 1 and 2), we have:
56+
- **Top Performers Set:** {User 1, User 2}
57+
- **User 1's ordered courses:**
58+
- Order 1: Python Basics
59+
- Order 2: SQL Fundamentals
60+
- Order 3: JavaScript
61+
- Order 4: React Basics
62+
- Order 5: Node.js
63+
- Order 6: Docker
64+
- **User 2's ordered courses:**
65+
- Order 1: Python Basics
66+
- Order 2: React Basics
67+
- Order 3: Node.js
68+
- Order 4: Docker
69+
- Order 5: AWS Fundamentals
70+
71+
**2.2 Start Checking/Processing**
72+
73+
We create consecutive pairs by joining each course with the next course in sequence. For User 1, we create pairs where course_order of the second course equals course_order + 1 of the first course.
74+
75+
**2.3 Trace Walkthrough**
76+
77+
Let's trace how pairs are created:
78+
79+
| User | First Course | Second Course | Pair Created |
80+
|------|--------------|---------------|--------------|
81+
| 1 | Python Basics (order 1) | SQL Fundamentals (order 2) | Python Basics → SQL Fundamentals |
82+
| 1 | SQL Fundamentals (order 2) | JavaScript (order 3) | SQL Fundamentals → JavaScript |
83+
| 1 | JavaScript (order 3) | React Basics (order 4) | JavaScript → React Basics |
84+
| 1 | React Basics (order 4) | Node.js (order 5) | React Basics → Node.js |
85+
| 1 | Node.js (order 5) | Docker (order 6) | Node.js → Docker |
86+
| 2 | Python Basics (order 1) | React Basics (order 2) | Python Basics → React Basics |
87+
| 2 | React Basics (order 2) | Node.js (order 3) | React Basics → Node.js |
88+
| 2 | Node.js (order 3) | Docker (order 4) | Node.js → Docker |
89+
| 2 | Docker (order 4) | AWS Fundamentals (order 5) | Docker → AWS Fundamentals |
90+
91+
**2.4 Count and Aggregate**
92+
93+
After creating all pairs, we count their frequencies:
94+
95+
| First Course | Second Course | Count |
96+
|--------------|---------------|-------|
97+
| Node.js | Docker | 2 |
98+
| React Basics | Node.js | 2 |
99+
| Docker | AWS Fundamentals | 1 |
100+
| JavaScript | React Basics | 1 |
101+
| Python Basics | React Basics | 1 |
102+
| Python Basics | SQL Fundamentals | 1 |
103+
| SQL Fundamentals | JavaScript | 1 |
104+
105+
**2.5 Return Result**
106+
107+
The final result is ordered by `transition_count` descending, then by `first_course` ascending, then by `second_course` ascending:
108+
109+
| first_course | second_course | transition_count |
110+
|--------------|---------------|------------------|
111+
| Node.js | Docker | 2 |
112+
| React Basics | Node.js | 2 |
113+
| Docker | AWS Fundamentals | 1 |
114+
| JavaScript | React Basics | 1 |
115+
| Python Basics | React Basics | 1 |
116+
| Python Basics | SQL Fundamentals | 1 |
117+
| SQL Fundamentals | JavaScript | 1 |
118+
119+
> **Note:** The window function `ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY completion_date)` ensures that courses are numbered sequentially for each student based on when they completed them, which is crucial for identifying consecutive pairs correctly.
120+

solutions/3764/01.sql

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
-- Solution for LeetCode 3764: Most Common Course Pairs
2+
-- Find skill mastery pathways by analyzing course completion sequences among top-performing students
3+
4+
WITH top_performers AS (
5+
-- Step 1: Identify top-performing students
6+
-- Must have at least 5 courses with average rating >= 4
7+
SELECT
8+
user_id
9+
FROM
10+
course_completions
11+
GROUP BY
12+
user_id
13+
HAVING
14+
COUNT(*) >= 5
15+
AND AVG(course_rating) >= 4
16+
),
17+
ordered_courses AS (
18+
-- Step 2: Get courses for top performers in chronological order
19+
SELECT
20+
cc.user_id,
21+
cc.course_name,
22+
cc.completion_date,
23+
ROW_NUMBER() OVER (
24+
PARTITION BY cc.user_id
25+
ORDER BY cc.completion_date
26+
) AS course_order
27+
FROM
28+
course_completions cc
29+
INNER JOIN
30+
top_performers tp ON cc.user_id = tp.user_id
31+
),
32+
course_pairs AS (
33+
-- Step 3: Create consecutive course pairs
34+
SELECT
35+
oc1.course_name AS first_course,
36+
oc2.course_name AS second_course
37+
FROM
38+
ordered_courses oc1
39+
INNER JOIN
40+
ordered_courses oc2
41+
ON oc1.user_id = oc2.user_id
42+
AND oc2.course_order = oc1.course_order + 1
43+
)
44+
-- Step 4: Count pair frequencies and order results
45+
SELECT
46+
first_course,
47+
second_course,
48+
COUNT(*) AS transition_count
49+
FROM
50+
course_pairs
51+
GROUP BY
52+
first_course,
53+
second_course
54+
ORDER BY
55+
transition_count DESC,
56+
first_course ASC,
57+
second_course ASC;
58+

0 commit comments

Comments
 (0)