A powerful AI-powered document redaction platform that automatically detects and redacts sensitive information from PDF documents using advanced machine learning models.
- AI-Powered Detection: Advanced NLP models detect sensitive entities like SSNs, emails, phone numbers, addresses, and more
- PDF Redaction: Secure visual redaction of PDF documents with black bars
- Interactive Review: Review and select which entities to redact before processing
- Audit Trail: Complete audit logs of all redaction activities
- Modern UI: Beautiful, responsive interface built with React and Tailwind CSS
- Real-time Processing: Fast document processing with progress indicators
SmartRedact/
βββ backend/ # FastAPI Backend
β βββ app.py # Main application
β βββ requirements.txt # Python dependencies
β βββ uploads/ # Document storage
β βββ venv/ # Virtual environment
β βββ README.md # Backend documentation
βββ src/ # React Frontend
β βββ components/ # React components
β βββ hooks/ # Custom hooks
β βββ services/ # API services
β βββ pages/ # Page components
βββ package.json # Frontend dependencies
βββ vite.config.ts # Vite configuration
βββ README.md # This file
- Frontend: React + TypeScript + Vite + Tailwind CSS
- Backend: FastAPI + Python (in
backend/folder) - AI Models: Transformers (BERT-based NER) + spaCy + Regex patterns
- Document Processing: PyMuPDF (PDF), python-docx (Word), Tesseract (OCR)
- Python 3.8+
- Node.js 16+
- npm or yarn
# Clone the repository
git clone <repository-url>
cd SmartRedact
# Setup and start both frontend and backend
npm install
cd backend && python3 -m venv venv && source venv/bin/activate && pip install -r requirements.txt && python app.py &
cd .. && npm run dev# Navigate to backend folder
cd backend
# Create virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Start the backend
python app.py# Install dependencies
npm install
# Start the development server
npm run dev- Frontend: http://localhost:5173
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/api/docs
- Alternative API Docs: http://localhost:8000/api/redoc
GET /api/health
Returns API status and available features.
POST /api/upload
Content-Type: multipart/form-data
Body: file (PDF, DOCX, or image)
Uploads a document and returns detected entities.
POST /api/redact
Content-Type: application/json
Body: {
"document_id": "string",
"entities": [{"text": "string", "type": "string", "start": number, "end": number, "selected": boolean}]
}
Redacts selected entities from the document.
GET /api/download/{document_id}
Downloads the redacted document.
Create a .env file in the root directory:
# API Configuration
API_BASE_URL=http://localhost:8000/api
# Development
NODE_ENV=development
VITE_API_URL=http://localhost:8000/apiThe FastAPI backend can be configured by modifying backend/app.py:
- CORS Origins: Update allowed origins in the CORS middleware
- File Upload Limits: Modify file size limits
- AI Models: Enable/disable specific AI models
# Navigate to backend folder
cd backend
# Activate virtual environment
source venv/bin/activate
# Run tests (if available)
python -m pytest tests/# Run tests
npm test
# Run tests with coverage
npm run test:coverage- FastAPI: Modern web framework for building APIs
- PyMuPDF: PDF processing and manipulation
- Transformers: Hugging Face transformers for NLP
- spaCy: Advanced NLP library
- Tesseract: OCR engine for image text extraction
- Pillow: Image processing library
- python-docx: Word document processing
- React: UI library
- TypeScript: Type-safe JavaScript
- Vite: Fast build tool and dev server
- Tailwind CSS: Utility-first CSS framework
- Framer Motion: Animation library
- Axios: HTTP client
- Lucide React: Icon library
# Build frontend
npm run build
# The built files will be in the `dist/` directory# Dockerfile example
FROM python:3.9-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
tesseract-ocr \
libtesseract-dev \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies
COPY requirements_api.txt .
RUN pip install -r requirements_api.txt
# Copy application
COPY . .
# Expose port
EXPOSE 8000
# Run application
CMD ["python", "app.py"]- No Data Persistence: Documents are processed in memory and not stored
- Secure Redaction: Uses PyMuPDF's secure redaction features
- CORS Protection: Configured CORS policies
- Input Validation: Comprehensive input validation using Pydantic
-
Backend won't start
- Check if Python 3.8+ is installed
- Navigate to backend folder:
cd backend - Verify virtual environment is activated
- Install missing dependencies:
pip install -r requirements.txt
-
Frontend won't start
- Check if Node.js 16+ is installed
- Install dependencies:
npm install - Clear cache:
npm run dev -- --force
-
API connection issues
- Verify backend is running on port 8000
- Check CORS configuration
- Verify proxy settings in
vite.config.ts
-
Document processing fails
- Check file format (PDF, DOCX, images supported)
- Verify file size limits
- Check AI model availability
Enable debug logging by setting environment variables:
export DEBUG=1
export LOG_LEVEL=DEBUGThis project is licensed under the MIT License - see the LICENSE file for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
For support and questions:
- Create an issue in the repository
- Check the API documentation at http://localhost:8000/api/docs
- Review the troubleshooting section above
SmartRedact AI - Secure document redaction powered by artificial intelligence.