This application is a hybrid anime recommendation system combining collaborative filtering (user-user) with content-based recommendations. It uses hnswlib for fast approximate nearest neighbor (ANN) search, a Flask backend for API handling, and a SQLAlchemy database for data storage.
- User-User Collaborative Filtering:
- Finds similar users based on anime ratings.
- Recommends anime liked by similar users.
- Content-Based Recommendations:
- Suggests anime with similar genres to a user's favorites.
- Autocomplete API:
- Provides anime name suggestions for user input.
- Optimized Performance:
- Sparse matrix representation for memory efficiency.
- hnswlib ANN indexing for fast nearest neighbor search.
- Scalable Architecture:
- Handles thousands of users and anime with efficient memory usage.
-
Load Environment Variables:
- Load
DATABASE_URLandSECRET_KEYfrom.env. - Throw errors if not set.
- Load
-
Database Connection:
- Establish a SQLAlchemy connection pool.
- Verify connection liveness with
pool_pre_ping.
-
Load Anime Data:
- Fetch
anime_idandnameintoanime_dffor fuzzy matching and mapping.
- Fetch
-
Build hnswlib Index:
- Check for existing index and mappings on disk.
- If unavailable, build the user-anime interaction matrix and create a new index.
- Fetch Ratings:
- Load user ratings (
user_id,anime_id,rating) from the database. - Replace
-1with0to indicate no rating.
- Load user ratings (
- Map IDs to Indices:
- Assign users to rows and anime to columns.
- Construct Matrix:
- Create a Compressed Sparse Row (CSR) matrix, optimizing memory usage for large datasets.
-
Find Similar Users:
- Use hnswlib to query the k nearest neighbors (similar users).
-
Aggregate Recommendations:
- Collect anime highly rated by neighbors but not watched by the user.
- Rank recommendations by rating count and score.
-
Insert New Users:
- If a user is not in the index:
- Build their vector based on their anime list.
- Add it to the index and update the database.
- If a user is not in the index:
-
TF-IDF Vectorization:
- Represent anime genres using TF-IDF scores.
- Convert genres into a weighted bag-of-words format.
-
Calculate Similarity:
- Use cosine similarity to find anime similar to the user's favorites.
-
Generate Recommendations:
- Rank and return the top recommendations, excluding anime the user already liked.
- Accepts a
user_idandanime_list. - Supports two algorithms:
user-user(default)content-based
- Returns anime name suggestions based on a query string.
-
hnswlib:
- Fast ANN search for user similarity.
- Space-efficient graph-based indexing.
-
Flask:
- Lightweight backend for API handling.
- Supports CORS for cross-origin requests.
-
SQLAlchemy:
- Efficient ORM for database interactions.
- Connection pooling for scalability.
-
scipy.sparse:
- Memory-efficient representation of sparse matrices.
-
TF-IDF:
- Transform anime genres into numerical vectors for similarity calculations.
-
Sparse Matrix Representation:
- Saves memory by storing only non-zero ratings.
-
hnswlib Indexing:
- Provides O(log N) query time for nearest neighbor searches.
-
Profiling Tools:
@timeit_decorator: Measures function execution time.@memory_profiler_decorator: Tracks memory usage.
- Load environment variables.
- Establish a database connection.
- Fetch anime data into memory.
- Build or load the hnswlib index.
- Accept
user_idandanime_listvia API. - Insert the user into the hnswlib index if new.
- Query the index for similar users.
- Aggregate and rank recommendations.
- Accept
anime_listvia API. - Compute cosine similarity between user’s liked anime and all anime.
- Rank and return top results.
- Accept a query string via API.
- Fetch matching anime names from database.
- Built a hybrid recommendation system combining collaborative filtering and content-based approaches, demonstrating expertise in data science and backend engineering.
- Integrated and applied algorithms, data structures, and cloud-based tools to solve real-world problems.
- Gained confidence in designing, profiling, and scaling backend systems for high-performance applications.





