OptiPick is an advanced analytics tool that helps users make informed purchasing decisions by analyzing Amazon product reviews. It combines web scraping, sentiment analysis, and machine learning to provide comprehensive insights into product reviews, feature sentiment, and buying trends. The tool offers multi-level sentiment classification, aspect-based analysis, and AI-powered summaries to give users a deep understanding of product feedback and customer experiences.
- Smart URL Processing: Handles both full Amazon URLs and short links (amzn.in)
- Product Details Extraction: Gets product title, price, ratings, images, descriptions, and more
- Review Scraping: Fetches up to 200 reviews per product with metadata (date, country, verification status)
- NLTK VADER Sentiment: Fast and accurate sentiment scoring (-1 to +1)
- Multi-level Classification: Positive, Negative, and Neutral categorization
- NPS Calculation: Net Promoter Score derived from 5-star ratings
- Feature Keywords: TF-IDF based extraction of most discussed product features
- Aspect-Based Sentiment: Identifies specific product aspects and their sentiment
- Word Cloud Generation: Visual representation of review content
- Review Complexity Analysis: Sentence structure, subjectivity, and writing patterns
- Monthly Trend Analysis: Sentiment trends over time
- GPT-Powered Insights: Optional AI analysis using OpenAI API
- Modern Streamlit Interface: Clean, responsive design with dark theme
- Tabbed Navigation: Dashboard, Reviews, Compare, AI Summary, and Advanced NLP
- Product Comparison: Side-by-side analysis of two products
- Category Search: Browse and analyze product categories
- Interactive Charts: Trend visualization and sentiment distribution
- Python 3.8+
- Apify account with API token
- OpenAI API key (optional, for AI summaries)
-
Clone the repository
git clone <repository-url> cd optipick
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables Create a
.streamlit/secrets.tomlfile:APIFY_TOKEN = "your_apify_token_here" OPENAI_API_KEY = "your_openai_key_here" # Optional
Or set environment variables:
export APIFY_TOKEN="your_apify_token_here" export OPENAI_API_KEY="your_openai_key_here"
-
Run the application
streamlit run app.py
- Product Details:
XVDTQc4a7MDTqSTMJ- Extracts product information - Reviews:
R8WeJwLuzLZ6g4Bkk- Scrapes customer reviews
- Full Amazon URLs:
https://www.amazon.com/dp/ASINorhttps://www.amazon.in/dp/ASIN - Short URLs:
https://amzn.in/... - Category/Search URLs:
https://www.amazon.com/s?k=keyword
optipick/
βββ app.py # Main Streamlit application
βββ nlp_utils.py # Advanced NLP processing functions
βββ components.py # UI components and utilities
βββ scraper.py # Web scraping utilities
βββ analyzer.py # Data analysis functions
βββ summarizer.py # Text summarization utilities
βββ utils.py # General utility functions
βββ requirements.txt # Python dependencies
βββ README.md # This file
- Paste an Amazon product URL in the sidebar
- Adjust max reviews (20-200)
- Click "Fetch"
- Explore the different tabs for insights
- Analyze first product (Product A)
- Add second product URL in "Compare" section
- Click "Fetch B"
- View side-by-side comparison in Compare tab
- Go to "Category Search" tab in sidebar
- Enter Amazon search/category URL
- Set max products to analyze
- Browse results in the displayed table
- Product Header: Image, title, brand, pricing, ratings
- Sentiment Metrics: Positive, negative, neutral counts with NPS
- Monthly Trends: Time-series chart of sentiment over time
- Review Highlights: Best, worst, and most informative reviews
- Feature Keywords: Most discussed product aspects
- Complete Review Table: All scraped reviews with ratings, sentiment scores
- Sortable Columns: Date, country, verification status, sentiment
- Detailed Metadata: Review URLs, user verification status
- Aspect Analysis: Product features with associated sentiment
- Word Cloud: Visual representation of review content
- Complexity Stats: Writing patterns and review characteristics
- Detailed Sentiment: 5-level sentiment classification
- Key Phrases: Important terms and their frequency
- GPT Analysis: AI-powered insights (requires OpenAI API)
- Text Preprocessing: URL removal, normalization, cleaning
- VADER Scoring: Compound sentiment scores (-1 to +1)
- Classification: Threshold-based positive/negative/neutral labeling
- Feature Extraction: TF-IDF based keyword identification
- Aspect Mining: Entity and sentiment association
- Date Parsing: Multiple format support for review dates
- Price Handling: Currency normalization and formatting
- Review Filtering: Removes empty or invalid reviews
- Deduplication: Prevents duplicate review entries
- Streamlit Caching: Cached sentiment analyzer and computations
- Chunked Processing: Handles large review datasets efficiently
- Error Handling: Graceful fallbacks for API failures
The application includes robust error handling for:
- Invalid or inaccessible URLs
- Apify API failures and rate limits
- Missing or malformed data
- Network connectivity issues
- OpenAI API errors (graceful degradation)
streamlit>=1.28.0
pandas>=1.5.0
numpy>=1.21.0
nltk>=3.8
scikit-learn>=1.1.0
apify-client>=1.4.0
requests>=2.28.0
altair>=4.2.0
wordcloud>=1.9.0
textblob>=0.17.0
openai>=1.0.0 # For AI summaries
matplotlib>=3.6.0 # For additional visualizations
- Review scraping: Up to 200 reviews per product
- Category scraping: Up to 200 products per search
- Rate limits apply based on your Apify subscription
- Used only for AI summaries (optional feature)
- Minimal token usage per analysis
- Graceful fallback to rule-based summaries
Extend the nlp_utils.py file to add custom sentiment analysis models.
Modify scraper.py to add support for additional e-commerce platforms.
Customize the CSS in app.py for different visual themes.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Streamlit for the amazing web framework
- NLTK for natural language processing tools
- Apify for reliable web scraping infrastructure
- OpenAI for advanced AI capabilities
- scikit-learn for machine learning utilities
For support, questions, or feature requests:
- Open an issue on GitHub
- Check existing documentation
- Review the troubleshooting section
- Initial release with full functionality
- Support for Amazon product analysis
- Advanced NLP features
- Product comparison capabilities
- Modern UI with dark theme
Built with β€οΈ using Python and Streamlit