Skip to content

Eric0627/MovieRec

Repository files navigation

A Movie Recommender System

Introduction

Likelihood

This is a recommender system using Bayesian Personalized Ranking and Expectation-Maximization (EM). The naive Bayes model of movie ratings is represented by the belief network shown below, with hidden variable $Z ∈ {1, 2, \dots,k}$ and partially observed binary variables $R_1, R_2,\dots, R_n$ (corresponding to movie ratings).

Screenshot 2025-09-24 at 10 32 44 AM

This model assumes that there are $k$ different types of movie-goers, and that the $i^{th}$ type of movie-goer - who represents a fraction $P(Z = i)$ of the overall population—likes the $j^{th}$ movie with conditional probability $P(R_j = 1|Z = i)$. Let $\Omega_t$ denote the set of movies seen (and hence rated) by the $t^{th}$ movie-goer. Show that the likelihood of the $t^{th}$ audience's ratings is given by

$$P\left(\left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right) = \sum_{i=1}^k P(Z = i) \prod_{j \in \Omega_t} P\left(R_j = r_j^{(t)} | Z = i\right)$$

E-step (Expectation)

The E-step of this model is to compute, for each audience, the posterior probability that he or she corresponds to a particular type of movie-goer.

$$P\left(Z = i \left| \left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right.\right) = \frac{P(Z = i) \prod_{j \in \Omega_t} P\left(R_j = r_j^{(t)} | Z = i\right)}{\sum_{i'=1}^k P(Z = i') \prod_{j \in \Omega_t} P\left(R_j = r_j^{(t)} | Z = i'\right)}$$

M-step (Maximization)

The M-step of the model is to re-estimate the probabilities $P(Z = i)$ and $P(R_j = 1|Z = i)$ that define the Conditional Probability Tables (CPT) of the belief network. As shorthand, let

$$\rho_{it} = P\left(Z = i \left| \left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right.\right)$$

denote the probabilities computed in the E-step of the algorithm. Also, let $T$ denote the number of audience. The EM updates are given by

$$P(Z = i) \leftarrow \frac{1}{T} \sum_{t=1}^T \rho_{it}$$ $$P(R_j = 1|Z = i) \leftarrow \frac{\sum_{\{t|j \in \Omega_t\}} \rho_{it} I\left(r_j^{(t)}, 1\right) + \sum_{\{t|j \notin \Omega_t\}} \rho_{it} P(R_j = 1|Z = i)}{\sum_{t=1}^T \rho_{it}}$$

Implementation

Training

We use files MovieRec_probZ_init.txt and MovieRec_probR_init.txt to initialize the probabilities $P(Z = i)$ and $P(R_j = 1|Z = i)$ for a model with $k = 4$ types of movie-goers. Run 256 iterations of the EM algorithm, computing the (normalized) log-likelihood

$$L = \frac{1}{T} \sum_{t=1}^T \log P\left(\left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right)$$

at each iteration. We can find the increasing log-likelihood over the iterations of optimization, indicating the fitting trend during model training.

Screenshot 2025-09-24 at 11 07 34 AM

Personalized movie recommendation

Find any audience ID in MovieRec_ids.txt to determine the row of the ratings matrix that stores their personal data. Compute the posterior probability for this row from the trained model, and then compute the expected ratings on the movies the audience hasn't yet seen:

$$P\left(R_\ell = 1 \left| \left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right.\right) = \sum_{i=1}^k P\left(Z = i \left| \left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right.\right) P(R_\ell = 1|Z = i) \text{ for } \ell \notin \Omega_t$$

The sample output is a list of unseen movies sorted by their expected ratings for the selected audience.

Screenshot 2025-09-24 at 11 09 07 AM

About

A Movie Recommender System

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published