Skip to content

This application is a hybrid anime recommendation system combining collaborative filtering (user-user) with content-based recommendations. It uses hnswlib for fast approximate nearest neighbor (ANN) search, a Flask backend for API handling, and a SQLAlchemy database for data storage.

Notifications You must be signed in to change notification settings

Th3red/recommender-app_anime

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Anime Recommendation System: Hybrid Approach with Flask and hnswlib

Overview

This application is a hybrid anime recommendation system combining collaborative filtering (user-user) with content-based recommendations. It uses hnswlib for fast approximate nearest neighbor (ANN) search, a Flask backend for API handling, and a SQLAlchemy database for data storage.


Features

  • User-User Collaborative Filtering:
    • Finds similar users based on anime ratings.
    • Recommends anime liked by similar users.
  • Content-Based Recommendations:
    • Suggests anime with similar genres to a user's favorites.
  • Autocomplete API:
    • Provides anime name suggestions for user input.
  • Optimized Performance:
    • Sparse matrix representation for memory efficiency.
    • hnswlib ANN indexing for fast nearest neighbor search.
  • Scalable Architecture:
    • Handles thousands of users and anime with efficient memory usage.

Frontend User-User

User-User

Backend

User-User

Frontend Contend Based

Content-based

Backend

Content-based

Other Frontend-tools

Fetch favorites from MAL

favorites-based

Suggestions

suggestions-based

Anime Recommendation System: Key Takeaways

System Workflow

1. Initialization

  1. Load Environment Variables:

    • Load DATABASE_URL and SECRET_KEY from .env.
    • Throw errors if not set.
  2. Database Connection:

    • Establish a SQLAlchemy connection pool.
    • Verify connection liveness with pool_pre_ping.
  3. Load Anime Data:

    • Fetch anime_id and name into anime_df for fuzzy matching and mapping.
  4. Build hnswlib Index:

    • Check for existing index and mappings on disk.
    • If unavailable, build the user-anime interaction matrix and create a new index.

2. Data Preparation

Sparse User-Anime Matrix

  • Fetch Ratings:
    • Load user ratings (user_id, anime_id, rating) from the database.
    • Replace -1 with 0 to indicate no rating.
  • Map IDs to Indices:
    • Assign users to rows and anime to columns.
  • Construct Matrix:
    • Create a Compressed Sparse Row (CSR) matrix, optimizing memory usage for large datasets.

3. Recommendation Algorithms

Collaborative Filtering (User-User)

  1. Find Similar Users:

    • Use hnswlib to query the k nearest neighbors (similar users).
  2. Aggregate Recommendations:

    • Collect anime highly rated by neighbors but not watched by the user.
    • Rank recommendations by rating count and score.
  3. Insert New Users:

    • If a user is not in the index:
      • Build their vector based on their anime list.
      • Add it to the index and update the database.

Content-Based Recommendations

  1. TF-IDF Vectorization:

    • Represent anime genres using TF-IDF scores.
    • Convert genres into a weighted bag-of-words format.
  2. Calculate Similarity:

    • Use cosine similarity to find anime similar to the user's favorites.
  3. Generate Recommendations:

    • Rank and return the top recommendations, excluding anime the user already liked.

4. Endpoints

/users/recommendations [POST]

  • Accepts a user_id and anime_list.
  • Supports two algorithms:
    • user-user (default)
    • content-based

/api/anime-suggestions [GET]

  • Returns anime name suggestions based on a query string.

Key Technologies

  • hnswlib:

    • Fast ANN search for user similarity.
    • Space-efficient graph-based indexing.
  • Flask:

    • Lightweight backend for API handling.
    • Supports CORS for cross-origin requests.
  • SQLAlchemy:

    • Efficient ORM for database interactions.
    • Connection pooling for scalability.
  • scipy.sparse:

    • Memory-efficient representation of sparse matrices.
  • TF-IDF:

    • Transform anime genres into numerical vectors for similarity calculations.

Performance Optimizations

  1. Sparse Matrix Representation:

    • Saves memory by storing only non-zero ratings.
  2. hnswlib Indexing:

    • Provides O(log N) query time for nearest neighbor searches.
  3. Profiling Tools:

    • @timeit_decorator: Measures function execution time.
    • @memory_profiler_decorator: Tracks memory usage.

Execution Flow

Startup Note (local)

  1. Load environment variables.
  2. Establish a database connection.
  3. Fetch anime data into memory.
  4. Build or load the hnswlib index.

User-User Recommendations

  1. Accept user_id and anime_list via API.
  2. Insert the user into the hnswlib index if new.
  3. Query the index for similar users.
  4. Aggregate and rank recommendations.

Content-Based Recommendations

  1. Accept anime_list via API.
  2. Compute cosine similarity between user’s liked anime and all anime.
  3. Rank and return top results.

Autocomplete

  1. Accept a query string via API.
  2. Fetch matching anime names from database.

Key Outcomes

  • Built a hybrid recommendation system combining collaborative filtering and content-based approaches, demonstrating expertise in data science and backend engineering.
  • Integrated and applied algorithms, data structures, and cloud-based tools to solve real-world problems.
  • Gained confidence in designing, profiling, and scaling backend systems for high-performance applications.

About

This application is a hybrid anime recommendation system combining collaborative filtering (user-user) with content-based recommendations. It uses hnswlib for fast approximate nearest neighbor (ANN) search, a Flask backend for API handling, and a SQLAlchemy database for data storage.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published