Autonomous Business Intelligence System by Algorzen
An enterprise-grade data analytics automation platform that transforms raw datasets into executive-level business intelligence reports with AI-powered narratives, comprehensive EDA, and professional PDF outputs.
- โ Automatic Dataset Detection โ Identifies sales, finance, customer, or general data types
- โ Comprehensive EDA โ Missing values, statistics, correlations, distributions
- โ Smart KPI Extraction โ Context-aware metrics based on dataset characteristics
- โ Interactive Visualizations โ Heatmaps, distributions, and statistical plots
- โ GPT-4 Integration โ Executive-level narratives with strategic recommendations
- โ Fallback Intelligence โ Rule-based narrative generation when API unavailable
- โ Business Tone โ Professional, McKinsey-style executive summaries
- โ Actionable Recommendations โ Data-driven strategic insights
- โ Branded PDF Reports โ Eviden formatting
- โ Executive Presentation Quality โ Ready for stakeholder meetings
- โ Metadata Tracking โ JSON reports with full traceability
- โ Multi-Format Support โ CSV, Excel, Parquet inputs
- โ Streamlit Web UI โ User-friendly drag-and-drop interface
- โ CLI Tool โ Scriptable command-line automation
- โ Modular API โ Integrate into existing pipelines
- Python 3.10 or higher
- pip package manager
# 1. Clone the repository
git clone https://github.com/rizzshi/AiInsight.git
cd AiInsight
# 2. Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Set up environment variables (optional for GPT-4)
cp .env.example .env
# Edit .env and add your OpenAI API keystreamlit run streamlit_app.pyThen open your browser to http://localhost:8501 and:
- Upload your dataset (CSV, Excel, or Parquet)
- Configure settings in the sidebar
- Click "Generate AI Report"
- Download your professional PDF report
# Generate sample dataset first (optional)
python -c "from src.utils import generate_sample_sales_data; generate_sample_sales_data(1000).to_csv('data/sample_dataset.csv', index=False)"
# Run analysis on sample data
python main.py data/sample_dataset.csv
# Or analyze your own dataset
python main.py path/to/your/data.csv --author "Your Name"
# With GPT-4 (requires API key)
python main.py data/your_data.csv --api-key sk-your-key-here --verboseimport pandas as pd
from src.eda_engine import perform_eda
from src.kpi_extractor import extract_kpis
from src.ai_narrator import generate_narrative
from src.pdf_generator import generate_pdf_report
# Load your data
df = pd.read_csv('your_data.csv')
# Run automated analysis
eda_summary = perform_eda(df)
kpis = extract_kpis(df, eda_summary['dataset_info']['dataset_type'])
narrative = generate_narrative(eda_summary, kpis)
# Generate PDF report
pdf_path = generate_pdf_report(eda_summary, kpis, narrative)
print(f"Report saved to: {pdf_path}")A synthetic sales dataset with 1,000 records is included for testing:
# Generate sample data
python src/utils.py
# Analyze sample data
python main.py data/sample_dataset.csv --verboseSample Dataset Schema:
- Transaction ID, Date, Product, Category
- Region, Channel, Quantity, Pricing
- Revenue, Discounts, Profit Margins
- Customer IDs
The system analyzes column names and data patterns to automatically classify datasets:
- Sales: Revenue, products, quantities, pricing
- Finance: Transactions, balances, debits/credits
- Customer: Churn, segments, lifetime value
- General: Fallback for other dataset types
Comprehensive exploratory data analysis includes:
- Missing value detection and quantification
- Statistical summaries (mean, median, std dev, quartiles)
- Correlation analysis with heatmap visualization
- Distribution plots for numeric and categorical features
Context-aware KPI calculation based on dataset type:
| Dataset Type | Example KPIs |
|---|---|
| Sales | Total Revenue, Average Order Value, Top Products, Margin Analysis |
| Finance | Total Balance, Net Position, Transaction Volume, Account Metrics |
| Customer | Churn Rate, Retention Rate, Avg Customer Value, Segment Distribution |
| General | Data Completeness, Record Count, Feature Diversity |
Two-tier intelligent narrative system:
Tier 1: GPT-4 (when API key provided)
- Executive summary (3-5 sentences)
- Key findings (4-6 bullet points)
- Actionable recommendations
- Risks and limitations
Tier 2: Rule-Based Fallback
- Pattern-based insights
- Statistical observations
- Domain-specific recommendations
- Data quality assessment
Professional report generation with:
- Eviden branding (Created by Algorzen)
- Title page with metadata
- KPI summary tables
- Visualizations (heatmaps, distributions)
- AI-generated narratives
- Data quality appendix
# OpenAI Configuration (optional)
OPENAI_API_KEY=sk-your-api-key-here
OPENAI_MODEL=gpt-4-turbo-preview
# Report Branding (optional)
COMPANY_NAME=Algorzen
AUTHOR_NAME=Rishi Singhpython main.py --help
Arguments:
input_file Path to dataset (CSV, Excel, Parquet)
Options:
--output DIR Output directory (default: reports/)
--author NAME Report author (default: Rishi Singh)
--api-key KEY OpenAI API key for GPT-4
--no-pdf Skip PDF generation
--verbose Show detailed progressAiInsight/
โโโ src/
โ โโโ eda_engine.py # Automated EDA engine
โ โโโ kpi_extractor.py # KPI calculation module
โ โโโ ai_narrator.py # GPT-4 narrative generator
โ โโโ pdf_generator.py # PDF report builder
โ โโโ utils.py # Helper functions
โโโ data/
โ โโโ sample_dataset.csv # Sample sales data (1000 records)
โโโ reports/
โ โโโ assets/ # Generated charts and visualizations
โ โโโ Eviden_Insight_Report_YYYYMMDD.pdf
โ โโโ report_metadata.json # Report metadata
โโโ main.py # CLI entry point
โโโ streamlit_app.py # Web UI application
โโโ requirements.txt # Python dependencies
โโโ .env.example # Environment variable template
โโโ README.md # This file
- Automate routine data analysis reports
- Generate executive summaries for stakeholders
- Standardize reporting across departments
- Quick exploratory data analysis
- Automated KPI tracking
- Professional report generation
- Client data analysis and reporting
- Strategic insights with AI narratives
- Branded deliverables
- Cost-effective business intelligence
- No-code analytics for non-technical users
- Scalable reporting infrastructure
from src.eda_engine import EDAEngine
# Force specific dataset type
df = pd.read_csv('your_data.csv')
engine = EDAEngine(df)
engine.dataset_type = 'finance' # Override auto-detection
summary = engine.run_full_eda()# Example: Daily automated reporting
import schedule
from src.utils import load_dataset
from main import main
def daily_report():
# Your ETL pipeline
df = extract_from_database()
df.to_csv('temp_data.csv', index=False)
# Generate report
import sys
sys.argv = ['main.py', 'temp_data.csv', '--verbose']
main()
schedule.every().day.at("09:00").do(daily_report)from src.kpi_extractor import KPIExtractor
class CustomKPIExtractor(KPIExtractor):
def extract_custom_kpis(self):
kpis = {}
# Your custom KPI logic here
kpis['Custom Metric'] = calculate_custom_metric(self.df)
return kpisContributions are welcome! Please feel free to submit a Pull Request.
# Clone and setup
git clone https://github.com/rizzshi/AiInsight.git
cd AiInsight
# Install development dependencies
pip install -r requirements.txt
pip install pytest black flake8
# Run tests (when available)
pytest tests/
# Format code
black src/ *.pyThis project is licensed under the MIT License - see the LICENSE file for details.
Rishi Singh
Eviden (Created by Algorzen)
- GitHub: @rizzshi
- Project: DataSphere/AiInsight
- OpenAI for GPT-4 API
- ReportLab for PDF generation
- Streamlit for web UI framework
- The open-source data science community
For questions, issues, or feature requests:
- Open an issue on GitHub
- Contact: Rishi Singh via GitHub
- Multi-language narrative support
- Custom branding templates
- Real-time data source connectors (SQL, APIs)
- Automated email report delivery
- Interactive dashboard mode
- Advanced statistical tests
- Time series forecasting
- Anomaly detection
- Collaborative annotations
- Report version control
