-
Notifications
You must be signed in to change notification settings - Fork 0
Shriyashzzz/Gutenberg_Project
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Β | Β | |||
Repository files navigation
Project Gutenberg Word Frequency Analyzer A web application that searches and analyzes word frequencies in Project Gutenberg books. The application stores results in a local SQLite database for quick retrieval. Features π Search by Title: Query the local database for previously analyzed books π Word Frequency Analysis: Automatically analyzes and displays the top 10 most frequent words π Web Scraping: Fetches books directly from Project Gutenberg πΎ Local Database: Stores book information and word frequencies using SQLite3 π¨ Modern UI: Clean, responsive web interface built with Flask π« Stop Word Filtering: Automatically filters out common words that don't add meaning Technologies Used Python 3.x Flask - Web framework SQLite3 - Local database Requests - Web scraping HTML/CSS/JavaScript - Frontend interface Installation Clone this repository: cd gutenberg-analyzer Install required packages: bashpip install -r requirements.txt Run the application: bashpython app.py Open your browser and navigate to: http://localhost:5000 Usage Search by Title Enter a book title in the "Search by Title" field Click "Search Database" If found, the top 10 most frequent words will be displayed Add New Book Find a book on Project Gutenberg Copy the plain text URL (usually ends with .txt) Paste the URL in the "Add New Book by URL" field Click "Analyze & Store" The book will be analyzed and stored in the database Example URLs Little Women: https://www.gutenberg.org/cache/epub/37106/pg37106.txt Pride and Prejudice: https://www.gutenberg.org/cache/epub/1342/pg1342.txt Alice in Wonderland: https://www.gutenberg.org/cache/epub/11/pg11.txt Database Schema Books Table id (INTEGER PRIMARY KEY) title (TEXT UNIQUE) url (TEXT) Word Frequencies Table id (INTEGER PRIMARY KEY) book_id (INTEGER FOREIGN KEY) word (TEXT) frequency (INTEGER) Project Structure gutenberg-analyzer/ βββ app.py # Main application file βββ requirements.txt # Python dependencies βββ README.md # This file βββ gutenberg_books.db # SQLite database (created on first run) Exception Handling The application includes comprehensive error handling for: Database connection errors Network request failures Invalid URLs Missing book titles Text parsing errors Stop Words The application filters out common English words that don't contribute to meaning, including: Articles (the, a, an) Pronouns (I, you, he, she, it) Prepositions (in, on, at, to, from) Conjunctions (and, or, but) Common verbs (is, are, was, were, be) Future Enhancements User authentication Book recommendation system Advanced filtering options Export results to CSV Visualization of word frequencies Support for multiple languages Author Shriyash Ghimire Date : 3 December 2025 License This project is created for educational purposes as part of a course assignment.
About
Script that searches and analyzes word frequencies in Project Gutenberg books
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published