SuperSearchLocalDocs

A powerful full-text search system for local document collections. Search across DOCX, PDF, XLSX, CSV, and image files using a fast web interface powered by SQLite FTS5.

🚀 Standalone Applications - No Python Required!

SuperSearch Local Docs is distributed as three standalone Windows applications:

DocumentIndexer.exe - GUI for setup, configuration, and indexing documents
DocumentSearchGUI.exe - Desktop application for searching (recommended)
DocumentSearch.exe - Web server for searching (browser-based)

All applications work together and require no Python installation on end-user computers!

📖 See BUILD_INSTRUCTIONS.md to build the executables 📖 See DEPLOYMENT_GUIDE.md for deployment instructions

Quick Start for End Users

Run DocumentIndexer.exe → Configure server IP and index your documents
Run DocumentSearchGUI.exe → Search your documents in a desktop app
Done! No Python, no dependencies, no complicated setup.

Alternative: Run DocumentSearch.exe to use the browser-based interface instead.

Network Access: Configure your server IP address in DocumentIndexer.exe to allow access from other computers on your network.

For Developers

If you want to run from source or customize the code, see the Installation section below for Python-based development setup.

Features

Comprehensive search across:
- File names
- Folder and subfolder names
- Document content (text inside files)
Advanced filtering:
- Search scope (all, filenames only, folders only, content only)
- File type filters (Word, PDF, Excel, CSV, Images)
- Date range filtering
- File size filtering
- Multiple sort options (relevance, date, name, size)
Web-based Settings ⚙️:
- Configure database path through the UI
- Support for shared network databases
- No need to edit config files manually
Multi-user support - point multiple computers to a shared database
Abbreviation expansion - automatically expands abbreviations to keywords (CSV-based)
Popular words sidebar 🔥 - click common words to add them to your search
GUI Indexer - easy-to-use desktop application for initial setup and indexing
Pagination - browse through unlimited results with page navigation
Sticky search bar - stays visible when scrolling
Fast SQLite3 FTS5 indexing and search
Beautiful web interface with collapsible filters
Network accessible - server binds to all interfaces
Support for multiple formats:
- Word Documents (.docx)
- PDF Documents (.pdf)
- Excel Spreadsheets (.xlsx, .xls)
- CSV Files (.csv)
- Text Files (.txt)
- PowerShell Scripts (.ps1)
- Images (.jpg, .png, .gif, .bmp, .tiff) with OCR
Network share support - index documents from network locations (including UNC paths)
Smart snippet preview - see matching text highlighted in results
File action buttons - open file, open folder, or copy path directly from search results
Configurable default action - click search results to perform your preferred action (open/copy/folder)
Simple update - re-run indexer whenever you need to refresh
Pre-calculated word counts - popular words load instantly (<100ms)

Installation

1. Install Python Dependencies

pip install -r requirements.txt

2. Install Tesseract OCR (Optional - for image text extraction)

For image text extraction, you'll need Tesseract OCR:

Windows:

Download from: https://github.com/UB-Mannheim/tesseract/wiki
Install and add to PATH
Or set the path in your environment

Note: If Tesseract is not installed, images will be skipped during indexing.

Standalone Executables (No Python Required)

Want to deploy without installing Python? You can create self-contained Windows executables:

Search Server Executable

# Install PyInstaller (one-time)
pip install pyinstaller

# Build the search server executable
python build_exe.py

This creates a standalone application in dist\DocumentSearch\ that can run on any Windows computer without Python installed.

GUI Indexer Executable

# Build the GUI indexer executable
python build_indexer_exe.py

This creates a user-friendly desktop application in dist\DocumentIndexer\ for easy initial setup and document indexing.

📖 See BUILD_INSTRUCTIONS.md for complete details.

Configuration

Edit config.py to set your document path:

# Network share path where documents are located
DOCUMENT_PATH = r"\\192.168.203.207\Shared Folders"

# Or use a local path
# DOCUMENT_PATH = r"C:\Users\YourName\Documents"

Usage

Step 1: Index Your Documents

You can index documents using either the GUI application (recommended for first-time setup) or the command-line indexer.

Option A - GUI Indexer (Recommended)

Launch the GUI indexer:

python indexer_gui.py

The GUI provides:

Easy configuration of document path and database location
Visual progress tracking
Save configuration button
Database statistics viewer
Error reporting

Simply:

Click "Browse..." to select your document folder
Choose where to save/create the database file
Click "Save Configuration" to save your settings
Click "Start Indexing" to begin scanning documents

Option B - Command-Line Indexer

Run the command-line indexer to scan and extract text from all supported documents:

python indexer.py

This will:

Scan the configured document path
Extract text from all supported files
Store content in SQLite database with FTS5 indexing
Show progress and statistics

Note: You can re-run this anytime to update the index with new or modified documents.

Step 2: Start the Web Server

Launch the Flask web server:

Option A - Using Python:

python server.py

Option B - Using Batch File (Windows):

start_server.bat

The server will start and be accessible from:

Local computer: http://localhost:9000
Same network: http://YOUR-IP-ADDRESS:9000

The server now binds to all network interfaces, making it accessible from other computers on your network!

Step 3: Search Your Documents

Open your browser to http://127.0.0.1:9000
Enter your search query (searches file names, folder names, and content)
Optional: Click "Filters & Options" to refine your search
View results with highlighted snippets
Interact with results:
- Click the result to perform your default action (configurable in Settings)
- Use action buttons: 📂 Open File, 📋 Copy Path, or 📁 Open Folder
- Toast notifications confirm each action

Search Examples:

Search "invoice" - finds files named invoice.pdf, folders called "Invoices", or text containing "invoice"
Search "Acronis" - finds all files in the Acronis folder, files named Acronis*, or documents mentioning Acronis
Search "security" - finds documents with "security" in the content, filename, or folder path
Search abbreviations - automatically expands to keywords (see Abbreviation Expansion below)

Popular Words Sidebar 🔥

The left sidebar displays the 10 most common words found in your indexed documents:

Click any word to add it to your search query
Already searched words appear greyed out (read-only)
Auto-updates after each search
Hidden on smaller screens (< 1024px)

This feature helps you discover frequently used terms and speeds up common searches!

API Integration

SuperSearch Local Docs provides a complete REST API that allows you to integrate document search into your own applications and web pages.

API Integration Example

A ready-to-use example is included: api_integration_example.html

This standalone HTML file demonstrates:

✅ Search documents with live results
✅ Display database statistics
✅ Show popular words
✅ Open files and folders programmatically
✅ Copy file paths to clipboard
✅ Complete API documentation with examples

Quick Start:

Open api_integration_example.html in any web browser
Update the API_BASE_URL constant to match your server
Test the live search and API calls
Copy the code snippets into your own projects

Available API Endpoints

Endpoint	Method	Description
`/api/search`	POST	Search documents with filters and pagination
`/api/stats`	GET	Get database statistics (total docs, databases, last update)
`/api/top-words`	GET	Get the top 10 most common words across all documents
`/api/file/open`	POST	Open a file in its default application
`/api/file/open-folder`	POST	Open the folder containing a file
`/api/settings`	GET/POST	Get or update application settings

Example Search Request:

POST /api/search
{
  "query": "invoice",
  "page": 1,
  "per_page": 20,
  "file_types": ["Word Document", "PDF Document", "Excel Spreadsheet"]
}

Example Response:

{
  "results": [
    {
      "file_name": "Invoice_2024.pdf",
      "file_path": "C:\\Documents\\Invoices\\Invoice_2024.pdf",
      "file_type": "PDF Document",
      "file_size": 245760,
      "modified_date": "2024-01-15T10:30:00",
      "snippet": "...invoice for services rendered in January 2024..."
    }
  ],
  "total": 42,
  "page": 1,
  "total_pages": 3,
  "per_page": 20
}

Abbreviation Expansion

The system supports automatic abbreviation expansion using a CSV file. When you search for an abbreviation, it also searches for all associated keywords.

Setup

Create a file named alternate_names.csv in your document path with the following format:

abbreviation,keyword1,keyword2,keyword3,keyword4,keyword5,keyword6,keyword7,keyword8,keyword9,keyword10
API,interface,endpoint,service,rest,web,request
DB,database,sql,storage,query
UI,interface,frontend,display,screen,view
foo,bar,alice,bob,test,sample

Column 1: The abbreviation
Columns 2-11: Up to 10 keywords that the abbreviation represents

How It Works

When you search for "API", the system automatically searches for:

"API" (the original term)
"interface"
"endpoint"
"service"
"rest"
"web"
"request"

This also works in reverse - searching for "interface" will also search for "API".

Example file: See alternate_names_example.csv for a sample format.

Configuration via Web Interface

Click the ⚙️ Settings button in the top-right corner to configure:

Database Path

Change where the application looks for the indexed documents database. Perfect for multi-user setups!

Example use case: Share a database across multiple computers

\\192.168.203.207\Shared Folders\Databases\documents.sqlite3

Document Path

Set the root folder to scan when running the indexer.

Default Click Action

Choose what happens when you click on a search result:

Open file - Opens the file in its associated program (default)
Copy path - Copies the file path to clipboard
Open folder - Opens the folder containing the file

After changing settings:

Click "Save Changes"
Refresh your browser page
The new settings take effect immediately

For detailed configuration options, see SETTINGS.md

Advanced Filtering

Click "Filters & Options" to access powerful filtering:

Search Scope:

All - Search across filenames, folders, and content (default)
Filenames Only - Search only in document names
Folder Names Only - Search only in folder/subfolder names
File Content Only - Search only inside document text

File Type Filters:

Filter by Word, PDF, Excel, CSV, or Images
Select multiple types or uncheck to exclude types

Sort Options:

Relevance - Best matches first (default)
Date Modified - Newest documents first
File Name - Alphabetical order
File Size - Largest files first

Date Range:

Filter by modified date (from/to)

File Size:

Filter by minimum or maximum file size in bytes
Example: 1048576 = 1MB

Project Structure

SuperSearchLocalDocs/
├── config.py                      # Configuration settings
├── config_manager.py              # User configuration manager
├── database_manager.py            # Multi-database management
├── indexer.py                     # Command-line document indexer
├── indexer_gui.py                 # GUI document indexer (recommended)
├── server.py                      # Flask web server
├── search_gui.py                  # Desktop search application
├── company_abbreviations.py       # Abbreviation expansion system
├── build_exe.py                   # Build script for search server
├── build_indexer_exe.py           # Build script for GUI indexer
├── build_search_gui_exe.py        # Build script for search GUI
├── build_all.py                   # Build all executables
├── start_server.bat               # Windows batch file to start server
├── requirements.txt               # Python dependencies
├── user_config.json               # User settings (created after configuration)
├── vacuum_databases.py            # Database compaction utility
├── api_integration_example.html   # API integration example page
├── alternate_names.csv            # Abbreviation mappings (optional)
├── alternate_names_example.csv    # Example abbreviation file
├── templates/
│   └── index.html                # Web search interface
├── databases/                     # Database files (created after indexing)
│   └── *.sqlite3                 # One database per indexed folder
├── BUILD_INSTRUCTIONS.md          # Executable build documentation
├── SETTINGS.md                    # Configuration documentation
├── WORD_COUNTS_OPTIMIZATION.md    # Word counts feature documentation
└── README.md

How It Works

Indexing: The indexer walks through your document folder, extracts text from each file based on its type, and stores both metadata and content in SQLite3
Database: SQLite3 FTS5 (Full-Text Search 5) provides fast full-text search capabilities with ranking across file names, folder paths, and content
Search: The web interface sends queries to the Flask API, which uses FTS5 to find matching documents across all indexed fields
Results: Matching documents are displayed with snippets showing the search terms in context

Updating the Index

To add new documents or refresh the index:

python indexer.py

The indexer will:

Update existing documents if they've changed
Add new documents
Keep the database in sync with your document folder

Performance Optimization

Word Counts & Popular Words

The "Popular Words" feature uses pre-calculated word counts for instant performance (<100ms instead of 15+ seconds):

Word frequencies are calculated during indexing
Stored in a dedicated word_counts table with indexed queries
Common stop words (including "nan" from spreadsheets) are automatically filtered

For existing databases that were indexed before this optimization:

Open DocumentIndexer.exe
Click "🔨 Rebuild Word Counts" button
Wait for completion (rebuilds word statistics from existing documents)

Database Compaction

After rebuilding word counts, you can reclaim disk space by vacuuming the databases.

Option 1 - Using GUI (Recommended):

Open DocumentIndexer.exe
Click "🗜️ Vacuum Databases" button
Confirm and wait for completion
View space savings in the summary

Option 2 - Using Command Line:

python vacuum_databases.py

This runs SQLite's VACUUM command on all databases to:

Remove deleted/fragmented data
Reclaim disk space (especially after removing "nan" entries)
Improve query performance
Reports size before/after and space saved

📖 See WORD_COUNTS_OPTIMIZATION.md for technical details.

Multi-Database Support

SuperSearch now supports indexing multiple folders, each with its own database:

Each indexed folder gets a unique database file (MD5-hashed filename)
Search across all databases simultaneously
Statistics aggregated from all indexed folders
Popular words combined across all documents

This allows you to:

Index multiple network shares or local folders
Keep databases separate for easier management
Search everything from a single interface

Supported File Types

Format	Extension	Library Used
Word	.docx	python-docx
PDF	.pdf	PyPDF2
Excel	.xlsx	openpyxl
Excel (Legacy)	.xls	pandas + xlrd
CSV	.csv	pandas
Text	.txt	Built-in
PowerShell	.ps1	Built-in
Images	.jpg, .png, .gif, .bmp, .tiff	Pillow + pytesseract

Troubleshooting

Network path not accessible

Ensure you have permissions to access the network share
Try mapping the network drive first
Check that the path in config.py is correct

Tesseract not found

Install Tesseract OCR from the link above
Add Tesseract to your system PATH
Or skip image indexing (other formats will still work)

Database not found error

Run python indexer.py first to create the database
The database is created in the same folder as the scripts

Cannot access from other computers

Open DocumentIndexer.exe
Enter your computer's network IP address (e.g., 192.168.1.100) in "Server IP Address"
Click "💾 Save Server Config"
Restart DocumentSearch.exe or DocumentSearchGUI.exe
Other computers can now access at http://YOUR-IP:9000

Future Enhancements

Potential improvements:

Advanced search syntax (AND, OR, NOT operators)
Export search results to CSV/Excel
Document preview in browser
Multi-language support
Scheduled automatic indexing
Tag and categorize documents
Search history and saved searches
Duplicate file detection

License

See LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.claude		.claude
templates		templates
.gitattributes		.gitattributes
.gitignore		.gitignore
BUILD_INSTRUCTIONS.md		BUILD_INSTRUCTIONS.md
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
DISTRIBUTION_README.txt		DISTRIBUTION_README.txt
END_USER_README.md		END_USER_README.md
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
LICENSE		LICENSE
LINKING_EXISTING_DATABASES.md		LINKING_EXISTING_DATABASES.md
MULTI_DATABASE_ARCHITECTURE.md		MULTI_DATABASE_ARCHITECTURE.md
PARALLEL_FOLDER_INDEXING.md		PARALLEL_FOLDER_INDEXING.md
PARALLEL_PROCESSING.md		PARALLEL_PROCESSING.md
QUICKSTART.md		QUICKSTART.md
QUICK_START.md		QUICK_START.md
README.md		README.md
SETTINGS.md		SETTINGS.md
WORD_COUNTS_OPTIMIZATION.md		WORD_COUNTS_OPTIMIZATION.md
alternate_names_example.csv		alternate_names_example.csv
api_integration_example.html		api_integration_example.html
build_all.py		build_all.py
build_exe.py		build_exe.py
build_indexer_exe.py		build_indexer_exe.py
build_search_gui_exe.py		build_search_gui_exe.py
check_file.py		check_file.py
check_word_sources.py		check_word_sources.py
company_abbreviations.py		company_abbreviations.py
config.py		config.py
config_manager.py		config_manager.py
database_manager.py		database_manager.py
fix_corrupted_word_counts.py		fix_corrupted_word_counts.py
indexer.py		indexer.py
indexer_gui.py		indexer_gui.py
requirements.txt		requirements.txt
search_gui.py		search_gui.py
server.py		server.py
start_indexer_gui.bat		start_indexer_gui.bat
start_server.bat		start_server.bat
test_abbreviations.py		test_abbreviations.py
test_api_endpoints.py		test_api_endpoints.py
test_word_counts.py		test_word_counts.py
vacuum_databases.py		vacuum_databases.py
verify_content_purity.py		verify_content_purity.py

License

mrdatawolf/SuperSearchLocalDocs

Folders and files

Latest commit

History

Repository files navigation

SuperSearchLocalDocs

🚀 Standalone Applications - No Python Required!

Quick Start for End Users

For Developers

Features

Installation

1. Install Python Dependencies

2. Install Tesseract OCR (Optional - for image text extraction)

Standalone Executables (No Python Required)

Search Server Executable

GUI Indexer Executable

Configuration

Usage

Step 1: Index Your Documents

Option A - GUI Indexer (Recommended)

Option B - Command-Line Indexer

Step 2: Start the Web Server

Step 3: Search Your Documents

Popular Words Sidebar 🔥

API Integration

API Integration Example

Available API Endpoints

Abbreviation Expansion

Setup

How It Works

Configuration via Web Interface

Database Path

Document Path

Default Click Action

Advanced Filtering

Project Structure

How It Works

Updating the Index

Performance Optimization

Word Counts & Popular Words

Database Compaction

Multi-Database Support

Supported File Types

Troubleshooting

Network path not accessible

Tesseract not found

Database not found error

Cannot access from other computers

Future Enhancements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages