A CLI tool to automatically rename receipt files by extracting vendor, date, and amount information using OCR and optional AI-powered extraction.
- π Smart Text Extraction - Extract text from PDFs (native text or image-based) and images
- π OCR Support - Built-in Tesseract OCR for image-based receipts
- π€ AI-Powered Extraction - Optional LLM integration for improved accuracy
- Anthropic (Claude)
- OpenAI (ChatGPT)
- Ollama (local models)
- π Configurable Naming - Customize date format, separators, and filename templates
- π Restore Function - Revert renamed files to their original names
- π Watch Mode - Automatically process new files as they appear
- π§ Memory Feature - Learns vendor name corrections and applies them automatically
- π Confidence Scoring - Shows extraction confidence to help identify uncertain results
- π Multi-language - Support for multiple OCR languages
- πΎ Manifest Tracking - Keeps track of all renames for easy restoration
# Clone the repository
git clone https://github.com/yourusername/pappetizer.git
cd pappetizer
# Install dependencies
npm install
# Install Tesseract OCR (required)
# macOS
brew install tesseract
# Ubuntu/Debian
sudo apt-get install tesseract-ocr
# Link globally (optional)
npm linkBefore processing receipts, you can configure Pappetizer's default behavior using the interactive wizard. This saves your preferences so you don't need to specify them each time.
pappetizer configureThe wizard guides you through setting:
- Date format - How dates appear in filenames (YYYYMMDD, YYYY-MM-DD, DD-MM-YYYY, etc.)
- Filename separator - Character(s) between name parts (default:
-) - Filename template - Structure of the output filename
- Default currency - Fallback when currency can't be detected (USD, EUR, CHF, etc.)
- OCR language - Language for text recognition (English, German, French, etc.)
- LLM settings - AI provider, API keys, and model selection
Configuration is stored in ~/.config/pappetizer/config.json.
Scan a directory for receipt files and rename them based on extracted data:
pappetizer clean ./receiptsFor each file, Pappetizer extracts the vendor name, date, and amount, then presents a suggested new filename. You can accept, edit, or skip each suggestion.
Rename just one specific receipt file:
pappetizer clean ./receipt.pdfUseful when you have a single receipt to process without scanning an entire directory.
See what would be renamed without actually modifying any files:
pappetizer clean ./receipts --dry-runThis is helpful to verify the extraction quality before committing to changes. The output shows proposed renames but leaves all files untouched.
Recursively process all receipt files in nested subdirectories:
pappetizer clean ./receipts -rWithout this flag, only files in the specified directory are processed. Hidden directories (starting with .) are always skipped.
Skip the confirmation prompt and automatically accept all rename suggestions:
pappetizer clean ./receipts -yBest used after verifying extraction quality with --dry-run. Files with low confidence scores will still prompt for review to prevent incorrect renames.
Continuously monitor a directory and automatically process new files as they appear:
pappetizer clean ./receipts --watchUseful for automating receipt processing. When a new file is added to the directory, Pappetizer detects it and processes it automatically. Press Ctrl+C to stop watching.
Re-process files that have already been renamed by Pappetizer:
pappetizer clean ./receipts --forceBy default, Pappetizer tracks renamed files in a manifest and skips them on subsequent runs. Use --force to override this and re-extract data from previously processed files.
Options can be combined for powerful workflows:
# Preview recursive processing with auto-accept
pappetizer clean ./receipts -r --dry-run -y
# Watch a directory with recursive processing
pappetizer clean ./receipts -r --watch
# Force re-process with AI extraction
pappetizer clean ./receipts --force --use-llmAI-powered extraction significantly improves accuracy, especially for receipts with complex layouts or unusual formatting. The LLM analyzes the OCR text and intelligently extracts structured data.
Use Anthropic's Claude models for extraction:
pappetizer clean ./receipts --use-llm --llm-provider anthropic --api-key sk-ant-...Available models: claude-3-haiku (fastest), claude-3-5-haiku (balanced), claude-3-5-sonnet (best quality).
Use OpenAI's GPT models:
pappetizer clean ./receipts --use-llm --llm-provider openai --api-key sk-...Available models: gpt-4o-mini (fastest), gpt-4o (balanced), gpt-4-turbo (best quality).
Run AI extraction locally without sending data to external services:
pappetizer clean ./receipts --use-llm --llm-provider ollama --model llama3.2Requires Ollama running locally (default: http://localhost:11434). Specify a different host with --ollama-host.
Override the default model for any provider:
# Use a specific Claude model
pappetizer clean ./receipts --use-llm --llm-provider anthropic --model claude-3-5-sonnet-20241022
# Use a specific Ollama model
pappetizer clean ./receipts --use-llm --llm-provider ollama --model mistralPappetizer keeps a manifest of all renames, allowing you to restore files to their original names at any time.
Restore all renamed files in a directory to their original names:
pappetizer restore ./receiptsFor each file, you'll be asked to confirm the restoration. Files not in the manifest (not renamed by Pappetizer) are skipped.
Restore all files without confirmation prompts:
pappetizer restore ./receipts -ySee what would be restored without making changes:
pappetizer restore ./receipts --dry-runRestore files in all subdirectories:
pappetizer restore ./receipts -rThe default template is: {date}{sep}{vendor}{sep}{amount} {currency}{ext}
Available placeholders:
{date}- Transaction date (formatted according to your date format setting){vendor}- Vendor/store name (sanitized and uppercased){amount}- Total amount (with 2 decimal places){currency}- Currency code (3 letters, e.g., USD, EUR){sep}- Configured separator{ext}- Original file extension (preserved)
Example outputs:
- Default:
20240315 - WALMART - 42.99 USD.pdf - European date:
15-03-2024 - MIGROS - 23.50 CHF.pdf - Custom separator:
2024.03.15_AMAZON_129.00_EUR.pdf
Instead of passing API keys on the command line, you can set them as environment variables:
# For Anthropic/Claude
export ANTHROPIC_API_KEY=sk-ant-...
# For OpenAI/ChatGPT
export OPENAI_API_KEY=sk-...
# For Ollama (custom host)
export OLLAMA_HOST=http://localhost:11434With environment variables set, you can simply run:
pappetizer clean ./receipts --use-llm --llm-provider anthropic- PDF (
.pdf) - Native text extraction with automatic OCR fallback for image-based PDFs - Images (
.png,.jpg,.jpeg,.tiff,.tif,.bmp,.gif) - Full OCR processing
Pappetizer learns from your corrections. When you edit a vendor name (e.g., changing "AMAZON WEB SERVICES" to "AWS"), it remembers this preference and automatically applies it to future receipts from the same vendor.
Vendor aliases are stored globally in your configuration and work across all directories.
Each extraction includes a confidence score (0-100%) based on:
- Whether all fields (vendor, date, amount, currency) were detected
- Whether AI extraction was used (higher confidence)
Files with low confidence scores prompt for manual review even when using -y (auto-accept), helping prevent incorrect renames.
Contributions are welcome! Please feel free to submit a Pull Request.
MIT