A Movie Recommender System

Introduction

Likelihood

This is a recommender system using Bayesian Personalized Ranking and Expectation-Maximization (EM). The naive Bayes model of movie ratings is represented by the belief network shown below, with hidden variable $Z ∈ {1, 2, \dots,k}$ and partially observed binary variables $R_1, R_2,\dots, R_n$ (corresponding to movie ratings).

This model assumes that there are $k$ different types of movie-goers, and that the $i^{th}$ type of movie-goer - who represents a fraction $P(Z = i)$ of the overall population—likes the $j^{th}$ movie with conditional probability $P(R_j = 1|Z = i)$. Let $\Omega_t$ denote the set of movies seen (and hence rated) by the $t^{th}$ movie-goer. Show that the likelihood of the $t^{th}$ audience's ratings is given by

$$P\left(\left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right) = \sum_{i=1}^k P(Z = i) \prod_{j \in \Omega_t} P\left(R_j = r_j^{(t)} | Z = i\right)$$

E-step (Expectation)

The E-step of this model is to compute, for each audience, the posterior probability that he or she corresponds to a particular type of movie-goer.

$$P\left(Z = i \left| \left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right.\right) = \frac{P(Z = i) \prod_{j \in \Omega_t} P\left(R_j = r_j^{(t)} | Z = i\right)}{\sum_{i'=1}^k P(Z = i') \prod_{j \in \Omega_t} P\left(R_j = r_j^{(t)} | Z = i'\right)}$$

M-step (Maximization)

The M-step of the model is to re-estimate the probabilities $P(Z = i)$ and $P(R_j = 1|Z = i)$ that define the Conditional Probability Tables (CPT) of the belief network. As shorthand, let

$$\rho_{it} = P\left(Z = i \left| \left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right.\right)$$

denote the probabilities computed in the E-step of the algorithm. Also, let $T$ denote the number of audience. The EM updates are given by

$$P(Z = i) \leftarrow \frac{1}{T} \sum_{t=1}^T \rho_{it}$$

$$P(R_j = 1|Z = i) \leftarrow \frac{\sum_{\{t|j \in \Omega_t\}} \rho_{it} I\left(r_j^{(t)}, 1\right) + \sum_{\{t|j \notin \Omega_t\}} \rho_{it} P(R_j = 1|Z = i)}{\sum_{t=1}^T \rho_{it}}$$

Implementation

Training

We use files MovieRec_probZ_init.txt and MovieRec_probR_init.txt to initialize the probabilities $P(Z = i)$ and $P(R_j = 1|Z = i)$ for a model with $k = 4$ types of movie-goers. Run 256 iterations of the EM algorithm, computing the (normalized) log-likelihood

$$L = \frac{1}{T} \sum_{t=1}^T \log P\left(\left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right)$$

at each iteration. We can find the increasing log-likelihood over the iterations of optimization, indicating the fitting trend during model training.

Personalized movie recommendation

Find any audience ID in MovieRec_ids.txt to determine the row of the ratings matrix that stores their personal data. Compute the posterior probability for this row from the trained model, and then compute the expected ratings on the movies the audience hasn't yet seen:

$$P\left(R_\ell = 1 \left| \left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right.\right) = \sum_{i=1}^k P\left(Z = i \left| \left\{R_j = r_j^{(t)}\right\}_{j \in \Omega_t}\right.\right) P(R_\ell = 1|Z = i) \text{ for } \ell \notin \Omega_t$$

The sample output is a list of unseen movies sorted by their expected ratings for the selected audience.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
MovieRec.ipynb		MovieRec.ipynb
MovieRec_ids-2.txt		MovieRec_ids-2.txt
MovieRec_movies-1.txt		MovieRec_movies-1.txt
MovieRec_probR_init.txt		MovieRec_probR_init.txt
MovieRec_probZ_init.txt		MovieRec_probZ_init.txt
MovieRec_ratings-1.txt		MovieRec_ratings-1.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Movie Recommender System

Introduction

Likelihood

E-step (Expectation)

M-step (Maximization)

Implementation

Training

Personalized movie recommendation

About

Uh oh!

Releases

Packages

Languages

Eric0627/MovieRec

Folders and files

Latest commit

History

Repository files navigation

A Movie Recommender System

Introduction

Likelihood

E-step (Expectation)

M-step (Maximization)

Implementation

Training

Personalized movie recommendation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages