ocrtool-mcp is an open-source macOS-native OCR module built with Swift and Vision framework, designed to comply with the Model Context Protocol (MCP). It can be invoked by AI IDE tools like Claude Desktop, Cursor, Continue, Windsurf, Cline, Cherry Studio, or custom agents using JSON-RPC over stdin.
- โ Accurate OCR powered by macOS Vision Framework
- โ Recognizes both Chinese and English text
- โ MCP-compatible JSON-RPC interface
- โ Returns line-wise OCR results with bounding boxes (in pixels)
- โ Multiple image input methods (local path, URL, Base64)
- โ Flexible output formats (plain text, Markdown table, JSON, code comments)
- โ Lightweight, fast, and fully offline
- โ Open source free software
brew tap ihugang/ocrtool
brew install ocrtool-mcpReady to use after installation:
ocrtool-mcp --helpDownload the pre-compiled Universal Binary that supports all Macs (Intel and Apple Silicon):
# Download latest version (v1.0.0)
curl -L -O https://github.com/ihugang/ocrtool-mcp/releases/download/v1.0.0/ocrtool-mcp-v1.0.0-universal-macos.tar.gz
# Extract
tar -xzf ocrtool-mcp-v1.0.0-universal-macos.tar.gz
# Make executable
chmod +x ocrtool-mcp-v1.0.0-universal
# Move to system path (recommended)
sudo mv ocrtool-mcp-v1.0.0-universal /usr/local/bin/ocrtool-mcp
# Verify installation
ocrtool-mcp --helpAlternatively, you can download directly from the GitHub Releases page.
If you prefer to build from source or contribute to development:
git clone https://github.com/ihugang/ocrtool-mcp.git
cd ocrtool-mcp
swift build -c releaseThe executable will be located at .build/release/ocrtool-mcp
ocrtool-mcp --help
# Or if built from source
.build/release/ocrtool-mcp --helpocrtool-mcp
# Or if built from source
.build/release/ocrtool-mcpSend a JSON-RPC request via stdin:
{
"jsonrpc": "2.0",
"id": "1",
"method": "ocr_text",
"params": {
"image": "test.jpg",
"lang": "zh+en",
"format": "text"
}
}| Parameter | Type | Description | Example |
|---|---|---|---|
image / image_path |
String | Local image path (supports relative path and ~ expansion) |
"~/Desktop/test.jpg" |
url |
String | Image URL (auto-download) | "https://example.com/img.jpg" |
base64 |
String | Base64-encoded image data | "iVBORw0KGgo..." |
lang |
String | Recognition languages, separated by + |
"zh+en" (default)"en" |
enhanced |
Boolean | Use enhanced recognition | true (default) |
format |
String | Output format | See format options below |
output.insertAsComment |
Boolean | Format result as code comments | true / false |
output.language |
String | Language style for code comments | "python", "swift", "html" |
Note: Exactly one of image/image_path, url, or base64 must be provided.
| Format Value | Description | Output Example |
|---|---|---|
text / simple |
Plain text, one line per result | Hello\nWorld |
table / markdown |
Markdown table (with coordinates) | See examples below |
structured / full |
Full JSON-RPC response (with bbox) | See Quick Start section |
auto |
Auto-select: text for single line, table for multiple | - |
Claude Desktop uses claude_desktop_config.json to configure MCP servers.
Configuration File Location:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json
Configuration Example:
{
"mcpServers": {
"ocrtool": {
"command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
}
}
}Usage:
In Claude Desktop chat:
Please recognize this image: ~/Desktop/screenshot.png
Or more specifically:
Use ocr_text tool to recognize text in ~/Desktop/receipt.jpg and output as a table
Configuration File Location:
- macOS:
~/.cursor/config.jsonor via Cursor Settings UI
Configuration Example:
{
"mcpServers": {
"ocrtool-mcp": {
"command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
}
}
}Usage:
In Cursor AI chat:
@ocrtool-mcp recognize text from this image: ./assets/diagram.png
Configuration File Location:
- macOS:
~/.continue/config.json
Configuration Example:
{
"experimental": {
"modelContextProtocolServers": [
{
"name": "ocrtool-mcp",
"command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
}
]
}
}Configuration (via Settings UI):
- Open Windsurf Settings
- Find MCP Servers configuration
- Add new server:
- Name:
ocrtool-mcp - Command:
/path/to/ocrtool-mcp/.build/release/ocrtool-mcp
- Name:
Configuration File Location:
- macOS:
~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json
Configuration Example:
{
"mcpServers": {
"ocrtool-mcp": {
"command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
}
}
}Configuration (via UI):
- Open Cherry Studio Settings
- Navigate to
Settings โ MCP Servers โ Add Server - Fill in server information:
- Name:
ocrtool-mcp - Type:
STDIO - Command:
/path/to/ocrtool-mcp/.build/release/ocrtool-mcp - Arguments: (leave empty)
- Environment Variables: (leave empty)
- Name:
- Save configuration
Usage:
In Cherry Studio chat interface, if the model supports MCP tool calls, you'll see a wrench icon to directly invoke OCR functionality:
Recognize text from this image: ~/Desktop/screenshot.png
{
"jsonrpc": "2.0",
"id": "1",
"method": "ocr_text",
"params": {
"image": "~/Desktop/screenshot.png",
"format": "text"
}
}Output:
ไฝ ๅฅฝไธ็
Hello World
{
"jsonrpc": "2.0",
"id": "2",
"method": "ocr_text",
"params": {
"url": "https://example.com/receipt.jpg",
"lang": "zh+en",
"format": "markdown"
}
}Output:
| Text | X | Y | Width | Height |
|------|---|---|--------|--------|
| ๅๅๅ็งฐ | 120 | 50 | 200 | 30 |
| ๆป่ฎก๏ผยฅ99.00 | 120 | 450 | 250 | 28 |{
"jsonrpc": "2.0",
"id": "3",
"method": "ocr_text",
"params": {
"base64": "iVBORw0KGgoAAAANSUhEUgAAAAUA...",
"format": "structured"
}
}{
"jsonrpc": "2.0",
"id": "4",
"method": "ocr_text",
"params": {
"image": "./code_screenshot.png",
"output.insertAsComment": true,
"output.language": "python"
}
}Output:
# def hello():
# print("Hello World")The project includes a practical Python example script test/python/rename_images_by_ocr.py that demonstrates how to use OCR to automatically rename garbled image files on the desktop.
import json
import subprocess
def ocr_image(image_path, ocr_tool_path):
"""Call ocrtool-mcp to recognize image"""
json_rpc = json.dumps({
"jsonrpc": "2.0",
"id": "1",
"method": "ocr_text",
"params": {
"image": image_path,
"format": "structured",
"lang": "zh+en"
}
})
cmd = f"echo '{json_rpc}' | {ocr_tool_path}"
proc = subprocess.Popen(cmd, shell=True,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE)
out, err = proc.communicate()
result = json.loads(out.decode())
return result.get("result", {}).get("lines", [])
# Usage example
lines = ocr_image("~/Desktop/test.png",
"/path/to/ocrtool-mcp/.build/release/ocrtool-mcp")
for line in lines:
print(f"Text: {line['text']}, BBox: {line['bbox']}")Solution: Use the full path to the compiled executable, e.g.:
/Users/username/ocrtool-mcp/.build/release/ocrtool-mcpSolution:
- Check that the configuration file path is correct
- Restart Claude Desktop application
- Check log files (if available) for error messages
Solution:
- Verify the image path is correct and file exists
- Verify the image format is supported (PNG, JPG, JPEG, BMP, GIF, TIFF)
- Check if the image contains recognizable text
- Try using
enhanced: trueparameter for better accuracy
Solution:
chmod +x .build/release/ocrtool-mcp.
โโโ Package.swift # Swift package configuration
โโโ Sources/OCRToolMCP/main.swift # Main program source
โโโ test/python/rename_images_by_ocr.py # Python usage example
โโโ README.md # English documentation
โโโ README.zh.md # Chinese documentation
โโโ LICENSE # MIT License
โโโ .gitignore
- Hu Gang (ihugang)
Issues and Pull Requests are welcome!
MIT License