MovieLens

Introduction

MovieLens data can be used in recommendation system research. It has millions of real-world ratings.
Movie ratings are made on a 5-star scale, with half-star increments (0.5 stars - 5.0 stars).
In this project, we will find out good preprocessing steps and models using evaluation metrics such as HR@10, NDCG@10, and RMSE. Until now, research has been progressed only on RMSE metric.

Related Work

I organized related work in "MovieLens Paper" folder and "Recommender-System-Study" folder.
The results of studies conducted using MovieLens dataset can be seen in the "MovieLens Paper" folder.
You can learn basic concepts of the recommendation system using "Recommender-System-Study" folder.

Preprocessing

Added a "timestamp" value of when does the user watched the movie since the movie was released.
Genre forms such as "Romance, Adventure" was changed to numerical values.
The "timestamp" value was grouped at intervals by two weeks and defined as "day" value.
The train and test datasets are separated at a ratio of 9:1.

Recommender System Model

Movie ratings are made on a 5-star scale (0.5 stars - 5.0 stars). We can think that prediction can be (sigmoid output) * 4.5 + 0.5 because it will range between 0.5 and 5. However, using (sigmoid output) * 5.5 gives me much better accuracy in the movie rating prediction. It is better to have a small extra prediction range.
L2 regularization helps prevent overfitting. Experimentally, the L2 regularization cost has to be between 2% to 4% of the total cost.
The Wide & Deep model that uses both Matrix Factorization and MLP layers showed good performance.
Mixture-Rank Matrix Approximation(MRMA) has various embedding sizes and combines each matrix factorization model. The larger model has the higher performance, but there is also a trade-off that increases total time complexity.
Batch Normalization and Dropout techniques showed poor results in both train and test datasets.
Adam optimizer performs better than any other optimization algorithm.
It is better to put metadata of items such as genres in the MLP input layer without putting them in matrix factorization.

To see more details about my model refer to
https://github.com/itemgiver/MovieLens/blob/main/src/MovieLens10M_RMSE_7672.ipynb

Result

Test RMSE = 0.7672
You can see the other results on the RMSE value in this link.
https://paperswithcode.com/sota/collaborative-filtering-on-movielens-10m?metric=RMSE

Difference between the actual value and the predicted value Histogram

References

https://grouplens.org/datasets/movielens/

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
MovieLens Paper		MovieLens Paper
Recommender-System-Study		Recommender-System-Study
src		src
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MovieLens

Introduction

Related Work

Preprocessing

Recommender System Model

Result

References

About

Uh oh!

Releases

Packages

Languages

itemgiver/MovieLens

Folders and files

Latest commit

History

Repository files navigation

MovieLens

Introduction

Related Work

Preprocessing

Recommender System Model

Result

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages