diff --git a/.gitattributes b/.gitattributes index 38cd5be..0626369 100644 --- a/.gitattributes +++ b/.gitattributes @@ -1,3 +1,4 @@ .gitignore merge=ours README.md merge=ours docker-compose.yml merge=ours +backend/** merge=ours diff --git a/.github/workflows/main.yml b/.github/workflows/python-ci-cd.yml similarity index 100% rename from .github/workflows/main.yml rename to .github/workflows/python-ci-cd.yml diff --git a/.gitignore b/.gitignore index cbca715..7233e8f 100644 --- a/.gitignore +++ b/.gitignore @@ -58,6 +58,10 @@ tempCodeRunnerFile.py unit_test.py testing_workflow.py *.yaml +local.settings.json +playwright_browser +__pycache__ +docker-compose.yml scripts/ playwright_browser diff --git a/README.md b/README.md index 38ac21e..3ec168a 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,11 @@ # CiteMe - Automatic Citation Generation System -CiteMe is a modern, full-stack application designed to help students generate references and in-line citations and references efficiently. The system provides intelligent citation suggestions, reference management, and seamless integration with academic databases. +CiteMe is a modern, full-stack application designed to help students generate references and in-line citations efficiently. The system provides intelligent citation suggestions, reference management, and seamless integration with academic databases. -Students do not have to worry about searching for sources to back essays and thesis. This web app will search the web , format your document with intext citation and include the references, sources and metrics to grade the credibility of the sources. +Students do not have to worry about searching for sources to back essays and thesis. This web app will search the web, format your document with intext citation and include the references, sources and metrics to grade the credibility of the sources. The webapp also offers the choice of providing your own sources, in forms of urls, texts and pdfs and is able to use these sources to format your essays/thesis with intext citation and references in any citation format. - 🌐 **Live Demo**: [CiteMe Editor](https://cite-me-wpre.vercel.app/editor) ## 🚀 Features @@ -17,6 +16,10 @@ The webapp also offers the choice of providing your own sources, in forms of url - **Real-time Metrics**: Track citation impact and academic metrics - **Modern UI**: Responsive and intuitive user interface - **API Integration**: Seamless integration with academic databases and search engines +- **Web Scraping**: Intelligent web scraping with Playwright for source extraction +- **Vector Search**: Efficient document retrieval using Pinecone vector database +- **AI-Powered**: Integration with multiple AI models (Azure, Groq, Gemini) for citation generation +- **Credibility Scoring**: Automated source credibility assessment ## 📁 Project Structure @@ -29,10 +32,13 @@ CiteMe/ │ └── dist/ # Production build ├── backend/ │ ├── mainService/ # Core citation service -│ └── metricsService/ # Analytics and metrics service -├── .github/ # GitHub workflows and templates -├── docker-compose.yml # Docker services configuration -└── README.md # Project documentation +│ │ ├── src/ # Source code +│ │ ├── scripts/ # Utility scripts +│ │ └── config/ # Configuration files +│ └── metricsService/ # Analytics and metrics service +├── .github/ # GitHub workflows and templates +├── docker-compose.yml # Docker services configuration +└── README.md # Project documentation ``` ## 🏗️ Architecture @@ -41,7 +47,14 @@ The application is built using a microservices architecture with three main comp 1. **Frontend Service**: Vue.js 3 application hosted on Vercel 2. **Main Service**: FastAPI-based backend service handling core citation functionality + - Web scraping with Playwright + - Vector search with Pinecone + - AI model integration (Azure, Groq, Gemini) + - Citation generation and formatting 3. **Metrics Service**: FastAPI-based service for handling academic metrics and analytics + - Source credibility assessment + - Citation impact analysis + - Academic metrics tracking ## 🛠️ Tech Stack @@ -56,12 +69,34 @@ The application is built using a microservices architecture with three main comp ### Backend - Python 3.11 - FastAPI -- Pinecone -- Gemini +- Pinecone (Vector Database) +- Gemini (Google AI) +- Groq - Azure hosted LLMs +- Mixbread (Reranking) - LangChain +- Playwright (Web Scraping) - Various AI/ML libraries +## 🔑 Environment Setup + +Before running the services, you'll need to set up the following API keys: + +1. Google API Keys: + - `CX`: Google Programmable Search Engine ID + - `GPSE_API_KEY`: Google Programmable Search Engine API key + - `GOOGLE_API_KEY`: Gemini API key + +2. AI Service Keys: + - `GROQ_API_KEY`: Groq API key + - `PINECONE_API_KEY`: Pinecone vector database + - `MIXBREAD_API_KEY`: Mixbread reranking service + - `AZURE_MODELS_ENDPOINT`: Azure endpoint for citation generation + +3. Optional Services: + - `CREDIBILITY_API_URL`: URL for the credibility metrics service + - `SERVERLESS`: Set to TRUE for serverless mode + ## 🚀 Getting Started ### Prerequisites @@ -78,9 +113,10 @@ git clone https://github.com/yourusername/citeme.git cd citeme ``` -2. Create `.env` files in both service directories: - - `backend/mainService/.env` - - `backend/metricsService/.env` +2. Create a `.env` file in the root directory with all required API keys: +```bash +cp backend/mainService/.env.example .env +``` 3. Build and run the services using Docker Compose: ```bash @@ -174,11 +210,23 @@ cd ../metricsService pytest ``` +## 🔄 CI/CD Pipeline + +The project uses GitHub Actions for continuous integration and deployment: + +- **Automated Testing**: Runs on every push to main and pull requests +- **Python 3.11**: Uses the latest Python 3.11 environment +- **Test Dependencies**: Installs both main and test requirements +- **PR Management**: Automatically closes failed PRs with explanatory comments +- **Environment Variables**: Securely manages API keys and configuration + +The pipeline can be found in `.github/workflows/python-ci-cd.yml`. + ## 📦 Docker Images The backend services have their own Dockerfiles: -- `backend/mainService/Dockerfile`: Python-based main service +- `backend/mainService/Dockerfile`: Python-based main service with Playwright support - `backend/metricsService/Dockerfile`: Python-based metrics service ## 🤝 Contributing diff --git a/backend/mainService/Dockerfile b/backend/mainService/Dockerfile index 084364c..db7cd05 100644 --- a/backend/mainService/Dockerfile +++ b/backend/mainService/Dockerfile @@ -2,12 +2,13 @@ FROM python:3.11-slim WORKDIR /app -# Install system dependencies +# Install system dependencies including Playwright requirements # Installs essential tools for compiling software from source, often needed for Python package dependencies.(build-essential) # Removes the package lists downloaded during the update to reduce the image size. RUN apt-get update && apt-get install -y \ build-essential \ cron \ + wget \ && rm -rf /var/lib/apt/lists/* # Set the PATH environment variable to include /app @@ -19,18 +20,18 @@ COPY requirements.txt . # Install Python dependencies RUN pip install --no-cache-dir -r requirements.txt +# Install Playwright and its dependencies +RUN playwright install && playwright install-deps + +# Create necessary directories +RUN mkdir -p /app/config /tmp/downloads + # Copy the source code COPY ./scripts/ /app/scripts/ COPY ./src/ /app/src/ COPY ./app.py /app/app.py COPY ./__init__.py /app/__init__.py -# Create a directory for runtime configuration -RUN mkdir -p /app/config - -# Install playwright -RUN playwright install && playwright install-deps - # Expose the port the app runs on EXPOSE 8000 diff --git a/backend/metricsService/Dockerfile b/backend/metricsService/Dockerfile index 5cf1198..a79f36e 100644 --- a/backend/metricsService/Dockerfile +++ b/backend/metricsService/Dockerfile @@ -3,28 +3,20 @@ FROM python:3.11-slim WORKDIR /app # Install system dependencies -# Installs essential tools for compiling software from source, often needed for Python package dependencies.(build-essential) -# Removes the package lists downloaded during the update to reduce the image size. RUN apt-get update && apt-get install -y \ build-essential \ && rm -rf /var/lib/apt/lists/* -# Set the PATH environment variable to include /app -ENV PATH="/app:${PATH}" - -# Copy requirements first to leverage Docker cache +# Copy requirements first COPY requirements.txt . +RUN pip install --no-cache-dir -r requirements.txt -# Install Python dependencies -RUN pip install --no-cache-dir -r requirements.txt - -# Copy the rest of the application +# Copy the application COPY ./src/ /app/src/ -RUN cd /app/src +# Create necessary directories +RUN mkdir -p /app/config -# Expose the port the app runs on EXPOSE 8000 -# Command to run the application CMD ["uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"] \ No newline at end of file diff --git a/backend/metricsService/README.md b/backend/metricsService/README.md deleted file mode 100644 index 04ae6d9..0000000 Binary files a/backend/metricsService/README.md and /dev/null differ diff --git a/docker-compose.yml b/docker-compose.yml deleted file mode 100644 index fd369d7..0000000 --- a/docker-compose.yml +++ /dev/null @@ -1,36 +0,0 @@ -version: '3.8' - -services: - main_service: - build: - context: ./backend/mainService - dockerfile: Dockerfile - ports: - - "9020:8000" - env_file: - - ./backend/mainService/.env - environment: - - CREDIBILITY_API_URL=http://metrics_service:8000/api/v1/credibility/batch - volumes: - - ./backend/mainService:/app - networks: - - cite_me - depends_on: - - metrics_service - - metrics_service: - build: - context: ./backend/metricsService - dockerfile: Dockerfile - ports: - - "9050:8000" - env_file: - - ./backend/metricsService/.env - volumes: - - ./backend/metricsService:/app - networks: - - cite_me - -networks: - cite_me: - driver: bridge \ No newline at end of file diff --git a/frontend/src/components/MainPageHeader.vue b/frontend/src/components/MainPageHeader.vue index f2d985c..776a958 100644 --- a/frontend/src/components/MainPageHeader.vue +++ b/frontend/src/components/MainPageHeader.vue @@ -85,7 +85,7 @@ const toggleView = () => { type="text" placeholder="Untitled" required - maxlength="50" + maxlength="150" /> diff --git a/frontend/vercel.json b/frontend/vercel.json new file mode 100644 index 0000000..8b66688 --- /dev/null +++ b/frontend/vercel.json @@ -0,0 +1,7 @@ +{ + "rewrites": [ + { "source": "/editor", "destination": "/index.html" }, + { "source": "/preview", "destination": "/index.html" } + ] + } + \ No newline at end of file