ocrtool-mcp

ocrtool-mcp is an open-source macOS-native OCR module built with Swift and Vision framework, designed to comply with the Model Context Protocol (MCP). It can be invoked by AI IDE tools like Claude Desktop, Cursor, Continue, Windsurf, Cline, Cherry Studio, or custom agents using JSON-RPC over stdin.

✨ Features

✅ Accurate OCR powered by macOS Vision Framework
✅ Recognizes both Chinese and English text
✅ MCP-compatible JSON-RPC interface
✅ Returns line-wise OCR results with bounding boxes (in pixels)
✅ Multiple image input methods (local path, URL, Base64)
✅ Flexible output formats (plain text, Markdown table, JSON, code comments)
✅ Lightweight, fast, and fully offline
✅ Open source free software

📦 Installation

Method 1: Using Homebrew (Easiest)

brew tap ihugang/ocrtool
brew install ocrtool-mcp

Ready to use after installation:

ocrtool-mcp --help

Method 2: Download Pre-built Binary

Download the pre-compiled Universal Binary that supports all Macs (Intel and Apple Silicon):

# Download latest version (v1.0.0)
curl -L -O https://github.com/ihugang/ocrtool-mcp/releases/download/v1.0.0/ocrtool-mcp-v1.0.0-universal-macos.tar.gz

# Extract
tar -xzf ocrtool-mcp-v1.0.0-universal-macos.tar.gz

# Make executable
chmod +x ocrtool-mcp-v1.0.0-universal

# Move to system path (recommended)
sudo mv ocrtool-mcp-v1.0.0-universal /usr/local/bin/ocrtool-mcp

# Verify installation
ocrtool-mcp --help

Alternatively, you can download directly from the GitHub Releases page.

Method 3: Build from Source

If you prefer to build from source or contribute to development:

git clone https://github.com/ihugang/ocrtool-mcp.git
cd ocrtool-mcp
swift build -c release

The executable will be located at .build/release/ocrtool-mcp

🚀 Quick Start

View Help

ocrtool-mcp --help
# Or if built from source
.build/release/ocrtool-mcp --help

Run as MCP Module

ocrtool-mcp
# Or if built from source
.build/release/ocrtool-mcp

Send a JSON-RPC request via stdin:

{
  "jsonrpc": "2.0",
  "id": "1",
  "method": "ocr_text",
  "params": {
    "image": "test.jpg",
    "lang": "zh+en",
    "format": "text"
  }
}

📋 Parameters

Core Parameters

Parameter	Type	Description	Example
`image` / `image_path`	String	Local image path (supports relative path and `~` expansion)	`"~/Desktop/test.jpg"`
`url`	String	Image URL (auto-download)	`"https://example.com/img.jpg"`
`base64`	String	Base64-encoded image data	`"iVBORw0KGgo..."`
`lang`	String	Recognition languages, separated by `+`	`"zh+en"` (default) `"en"`
`enhanced`	Boolean	Use enhanced recognition	`true` (default)
`format`	String	Output format	See format options below
`output.insertAsComment`	Boolean	Format result as code comments	`true` / `false`
`output.language`	String	Language style for code comments	`"python"`, `"swift"`, `"html"`

Note: Exactly one of image/image_path, url, or base64 must be provided.

Output Format Options (`format` parameter)

Format Value	Description	Output Example
`text` / `simple`	Plain text, one line per result	`Hello\nWorld`
`table` / `markdown`	Markdown table (with coordinates)	See examples below
`structured` / `full`	Full JSON-RPC response (with bbox)	See Quick Start section
`auto`	Auto-select: text for single line, table for multiple	-

🛠 AI Tool Configuration Guide

Claude Desktop (Claude Code)

Claude Desktop uses claude_desktop_config.json to configure MCP servers.

Configuration File Location:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

Configuration Example:

{
  "mcpServers": {
    "ocrtool": {
      "command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
    }
  }
}

Usage:

In Claude Desktop chat:

Please recognize this image: ~/Desktop/screenshot.png

Or more specifically:

Use ocr_text tool to recognize text in ~/Desktop/receipt.jpg and output as a table

Cursor

Configuration File Location:

macOS: ~/.cursor/config.json or via Cursor Settings UI

Configuration Example:

{
  "mcpServers": {
    "ocrtool-mcp": {
      "command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
    }
  }
}

Usage:

In Cursor AI chat:

@ocrtool-mcp recognize text from this image: ./assets/diagram.png

Continue

Configuration File Location:

macOS: ~/.continue/config.json

Configuration Example:

{
  "experimental": {
    "modelContextProtocolServers": [
      {
        "name": "ocrtool-mcp",
        "command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
      }
    ]
  }
}

Windsurf

Configuration (via Settings UI):

Open Windsurf Settings
Find MCP Servers configuration
Add new server:
- Name: ocrtool-mcp
- Command: /path/to/ocrtool-mcp/.build/release/ocrtool-mcp

Cline (VSCode Extension)

Configuration File Location:

macOS: ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json

Configuration Example:

{
  "mcpServers": {
    "ocrtool-mcp": {
      "command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
    }
  }
}

Cherry Studio

Configuration (via UI):

Open Cherry Studio Settings
Navigate to Settings → MCP Servers → Add Server
Fill in server information:
- Name: ocrtool-mcp
- Type: STDIO
- Command: /path/to/ocrtool-mcp/.build/release/ocrtool-mcp
- Arguments: (leave empty)
- Environment Variables: (leave empty)
Save configuration

Usage:

In Cherry Studio chat interface, if the model supports MCP tool calls, you'll see a wrench icon to directly invoke OCR functionality:

Recognize text from this image: ~/Desktop/screenshot.png

💡 Usage Examples

Example 1: Recognize Local Image (Plain Text Output)

{
  "jsonrpc": "2.0",
  "id": "1",
  "method": "ocr_text",
  "params": {
    "image": "~/Desktop/screenshot.png",
    "format": "text"
  }
}

Output:

你好世界
Hello World

Example 2: Recognize Image from URL (Markdown Table)

{
  "jsonrpc": "2.0",
  "id": "2",
  "method": "ocr_text",
  "params": {
    "url": "https://example.com/receipt.jpg",
    "lang": "zh+en",
    "format": "markdown"
  }
}

Output:

| Text | X | Y | Width | Height |
|------|---|---|--------|--------|
| 商品名称 | 120 | 50 | 200 | 30 |
| 总计：¥99.00 | 120 | 450 | 250 | 28 |

Example 3: Base64 Image Recognition

{
  "jsonrpc": "2.0",
  "id": "3",
  "method": "ocr_text",
  "params": {
    "base64": "iVBORw0KGgoAAAANSUhEUgAAAAUA...",
    "format": "structured"
  }
}

Example 4: Generate Python Comment Format

{
  "jsonrpc": "2.0",
  "id": "4",
  "method": "ocr_text",
  "params": {
    "image": "./code_screenshot.png",
    "output.insertAsComment": true,
    "output.language": "python"
  }
}

Output:

# def hello():
#     print("Hello World")

🐍 Python Usage Example

The project includes a practical Python example script test/python/rename_images_by_ocr.py that demonstrates how to use OCR to automatically rename garbled image files on the desktop.

import json
import subprocess

def ocr_image(image_path, ocr_tool_path):
    """Call ocrtool-mcp to recognize image"""
    json_rpc = json.dumps({
        "jsonrpc": "2.0",
        "id": "1",
        "method": "ocr_text",
        "params": {
            "image": image_path,
            "format": "structured",
            "lang": "zh+en"
        }
    })

    cmd = f"echo '{json_rpc}' | {ocr_tool_path}"
    proc = subprocess.Popen(cmd, shell=True,
                          stdout=subprocess.PIPE,
                          stderr=subprocess.PIPE)
    out, err = proc.communicate()

    result = json.loads(out.decode())
    return result.get("result", {}).get("lines", [])

# Usage example
lines = ocr_image("~/Desktop/test.png",
                  "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp")
for line in lines:
    print(f"Text: {line['text']}, BBox: {line['bbox']}")

🔧 Troubleshooting

Issue 1: "command not found"

Solution: Use the full path to the compiled executable, e.g.:

/Users/username/ocrtool-mcp/.build/release/ocrtool-mcp

Issue 2: Claude Desktop Cannot Call MCP Server

Solution:

Check that the configuration file path is correct
Restart Claude Desktop application
Check log files (if available) for error messages

Issue 3: Empty Recognition Results

Solution:

Verify the image path is correct and file exists
Verify the image format is supported (PNG, JPG, JPEG, BMP, GIF, TIFF)
Check if the image contains recognizable text
Try using enhanced: true parameter for better accuracy

Issue 4: Permission Error

Solution:

chmod +x .build/release/ocrtool-mcp

📁 Project Structure

.
├── Package.swift                      # Swift package configuration
├── Sources/OCRToolMCP/main.swift      # Main program source
├── test/python/rename_images_by_ocr.py # Python usage example
├── README.md                          # English documentation
├── README.zh.md                       # Chinese documentation
├── LICENSE                            # MIT License
└── .gitignore

👨‍💻 Author

Hu Gang (ihugang)

🤝 Contributing

Issues and Pull Requests are welcome!

📝 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
.mcp		.mcp
Formula		Formula
Sources/OCRToolMCP		Sources/OCRToolMCP
docs		docs
scripts		scripts
test/python		test/python
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Package.swift		Package.swift
README.md		README.md
README.zh.md		README.zh.md
REDDIT_POST.md		REDDIT_POST.md
ocrtool-mcp		ocrtool-mcp

License

ihugang/ocrtool-mcp

Folders and files

Latest commit

History

Repository files navigation

ocrtool-mcp

✨ Features

📦 Installation

Method 1: Using Homebrew (Easiest)

Method 2: Download Pre-built Binary

Method 3: Build from Source

🚀 Quick Start

View Help

Run as MCP Module

📋 Parameters

Core Parameters

Output Format Options (format parameter)

🛠 AI Tool Configuration Guide

Claude Desktop (Claude Code)

Cursor

Continue

Windsurf

Cline (VSCode Extension)

Cherry Studio

💡 Usage Examples

Example 1: Recognize Local Image (Plain Text Output)

Example 2: Recognize Image from URL (Markdown Table)

Example 3: Base64 Image Recognition

Example 4: Generate Python Comment Format

🐍 Python Usage Example

🔧 Troubleshooting

Issue 1: "command not found"

Issue 2: Claude Desktop Cannot Call MCP Server

Issue 3: Empty Recognition Results

Issue 4: Permission Error

📁 Project Structure

👨‍💻 Author

🤝 Contributing

📝 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Output Format Options (`format` parameter)

Packages