Skip to content

ihugang/ocrtool-mcp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

18 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ocrtool-mcp

๐Ÿ‡จ๐Ÿ‡ณ ไธญๆ–‡ๆ–‡ๆกฃ

ocrtool-mcp is an open-source macOS-native OCR module built with Swift and Vision framework, designed to comply with the Model Context Protocol (MCP). It can be invoked by AI IDE tools like Claude Desktop, Cursor, Continue, Windsurf, Cline, Cherry Studio, or custom agents using JSON-RPC over stdin.

platform language mcp license


โœจ Features

  • โœ… Accurate OCR powered by macOS Vision Framework
  • โœ… Recognizes both Chinese and English text
  • โœ… MCP-compatible JSON-RPC interface
  • โœ… Returns line-wise OCR results with bounding boxes (in pixels)
  • โœ… Multiple image input methods (local path, URL, Base64)
  • โœ… Flexible output formats (plain text, Markdown table, JSON, code comments)
  • โœ… Lightweight, fast, and fully offline
  • โœ… Open source free software

๐Ÿ“ฆ Installation

Method 1: Using Homebrew (Easiest)

brew tap ihugang/ocrtool
brew install ocrtool-mcp

Ready to use after installation:

ocrtool-mcp --help

Method 2: Download Pre-built Binary

Download the pre-compiled Universal Binary that supports all Macs (Intel and Apple Silicon):

# Download latest version (v1.0.0)
curl -L -O https://github.com/ihugang/ocrtool-mcp/releases/download/v1.0.0/ocrtool-mcp-v1.0.0-universal-macos.tar.gz

# Extract
tar -xzf ocrtool-mcp-v1.0.0-universal-macos.tar.gz

# Make executable
chmod +x ocrtool-mcp-v1.0.0-universal

# Move to system path (recommended)
sudo mv ocrtool-mcp-v1.0.0-universal /usr/local/bin/ocrtool-mcp

# Verify installation
ocrtool-mcp --help

Alternatively, you can download directly from the GitHub Releases page.

Method 3: Build from Source

If you prefer to build from source or contribute to development:

git clone https://github.com/ihugang/ocrtool-mcp.git
cd ocrtool-mcp
swift build -c release

The executable will be located at .build/release/ocrtool-mcp


๐Ÿš€ Quick Start

View Help

ocrtool-mcp --help
# Or if built from source
.build/release/ocrtool-mcp --help

Run as MCP Module

ocrtool-mcp
# Or if built from source
.build/release/ocrtool-mcp

Send a JSON-RPC request via stdin:

{
  "jsonrpc": "2.0",
  "id": "1",
  "method": "ocr_text",
  "params": {
    "image": "test.jpg",
    "lang": "zh+en",
    "format": "text"
  }
}

๐Ÿ“‹ Parameters

Core Parameters

Parameter Type Description Example
image / image_path String Local image path (supports relative path and ~ expansion) "~/Desktop/test.jpg"
url String Image URL (auto-download) "https://example.com/img.jpg"
base64 String Base64-encoded image data "iVBORw0KGgo..."
lang String Recognition languages, separated by + "zh+en" (default)
"en"
enhanced Boolean Use enhanced recognition true (default)
format String Output format See format options below
output.insertAsComment Boolean Format result as code comments true / false
output.language String Language style for code comments "python", "swift", "html"

Note: Exactly one of image/image_path, url, or base64 must be provided.

Output Format Options (format parameter)

Format Value Description Output Example
text / simple Plain text, one line per result Hello\nWorld
table / markdown Markdown table (with coordinates) See examples below
structured / full Full JSON-RPC response (with bbox) See Quick Start section
auto Auto-select: text for single line, table for multiple -

๐Ÿ›  AI Tool Configuration Guide

Claude Desktop (Claude Code)

Claude Desktop uses claude_desktop_config.json to configure MCP servers.

Configuration File Location:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json

Configuration Example:

{
  "mcpServers": {
    "ocrtool": {
      "command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
    }
  }
}

Usage:

In Claude Desktop chat:

Please recognize this image: ~/Desktop/screenshot.png

Or more specifically:

Use ocr_text tool to recognize text in ~/Desktop/receipt.jpg and output as a table

Cursor

Configuration File Location:

  • macOS: ~/.cursor/config.json or via Cursor Settings UI

Configuration Example:

{
  "mcpServers": {
    "ocrtool-mcp": {
      "command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
    }
  }
}

Usage:

In Cursor AI chat:

@ocrtool-mcp recognize text from this image: ./assets/diagram.png

Continue

Configuration File Location:

  • macOS: ~/.continue/config.json

Configuration Example:

{
  "experimental": {
    "modelContextProtocolServers": [
      {
        "name": "ocrtool-mcp",
        "command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
      }
    ]
  }
}

Windsurf

Configuration (via Settings UI):

  1. Open Windsurf Settings
  2. Find MCP Servers configuration
  3. Add new server:
    • Name: ocrtool-mcp
    • Command: /path/to/ocrtool-mcp/.build/release/ocrtool-mcp

Cline (VSCode Extension)

Configuration File Location:

  • macOS: ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json

Configuration Example:

{
  "mcpServers": {
    "ocrtool-mcp": {
      "command": "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp"
    }
  }
}

Cherry Studio

Configuration (via UI):

  1. Open Cherry Studio Settings
  2. Navigate to Settings โ†’ MCP Servers โ†’ Add Server
  3. Fill in server information:
    • Name: ocrtool-mcp
    • Type: STDIO
    • Command: /path/to/ocrtool-mcp/.build/release/ocrtool-mcp
    • Arguments: (leave empty)
    • Environment Variables: (leave empty)
  4. Save configuration

Usage:

In Cherry Studio chat interface, if the model supports MCP tool calls, you'll see a wrench icon to directly invoke OCR functionality:

Recognize text from this image: ~/Desktop/screenshot.png

๐Ÿ’ก Usage Examples

Example 1: Recognize Local Image (Plain Text Output)

{
  "jsonrpc": "2.0",
  "id": "1",
  "method": "ocr_text",
  "params": {
    "image": "~/Desktop/screenshot.png",
    "format": "text"
  }
}

Output:

ไฝ ๅฅฝไธ–็•Œ
Hello World

Example 2: Recognize Image from URL (Markdown Table)

{
  "jsonrpc": "2.0",
  "id": "2",
  "method": "ocr_text",
  "params": {
    "url": "https://example.com/receipt.jpg",
    "lang": "zh+en",
    "format": "markdown"
  }
}

Output:

| Text | X | Y | Width | Height |
|------|---|---|--------|--------|
| ๅ•†ๅ“ๅ็งฐ | 120 | 50 | 200 | 30 |
| ๆ€ป่ฎก๏ผšยฅ99.00 | 120 | 450 | 250 | 28 |

Example 3: Base64 Image Recognition

{
  "jsonrpc": "2.0",
  "id": "3",
  "method": "ocr_text",
  "params": {
    "base64": "iVBORw0KGgoAAAANSUhEUgAAAAUA...",
    "format": "structured"
  }
}

Example 4: Generate Python Comment Format

{
  "jsonrpc": "2.0",
  "id": "4",
  "method": "ocr_text",
  "params": {
    "image": "./code_screenshot.png",
    "output.insertAsComment": true,
    "output.language": "python"
  }
}

Output:

# def hello():
#     print("Hello World")

๐Ÿ Python Usage Example

The project includes a practical Python example script test/python/rename_images_by_ocr.py that demonstrates how to use OCR to automatically rename garbled image files on the desktop.

import json
import subprocess

def ocr_image(image_path, ocr_tool_path):
    """Call ocrtool-mcp to recognize image"""
    json_rpc = json.dumps({
        "jsonrpc": "2.0",
        "id": "1",
        "method": "ocr_text",
        "params": {
            "image": image_path,
            "format": "structured",
            "lang": "zh+en"
        }
    })

    cmd = f"echo '{json_rpc}' | {ocr_tool_path}"
    proc = subprocess.Popen(cmd, shell=True,
                          stdout=subprocess.PIPE,
                          stderr=subprocess.PIPE)
    out, err = proc.communicate()

    result = json.loads(out.decode())
    return result.get("result", {}).get("lines", [])

# Usage example
lines = ocr_image("~/Desktop/test.png",
                  "/path/to/ocrtool-mcp/.build/release/ocrtool-mcp")
for line in lines:
    print(f"Text: {line['text']}, BBox: {line['bbox']}")

๐Ÿ”ง Troubleshooting

Issue 1: "command not found"

Solution: Use the full path to the compiled executable, e.g.:

/Users/username/ocrtool-mcp/.build/release/ocrtool-mcp

Issue 2: Claude Desktop Cannot Call MCP Server

Solution:

  1. Check that the configuration file path is correct
  2. Restart Claude Desktop application
  3. Check log files (if available) for error messages

Issue 3: Empty Recognition Results

Solution:

  1. Verify the image path is correct and file exists
  2. Verify the image format is supported (PNG, JPG, JPEG, BMP, GIF, TIFF)
  3. Check if the image contains recognizable text
  4. Try using enhanced: true parameter for better accuracy

Issue 4: Permission Error

Solution:

chmod +x .build/release/ocrtool-mcp

๐Ÿ“ Project Structure

.
โ”œโ”€โ”€ Package.swift                      # Swift package configuration
โ”œโ”€โ”€ Sources/OCRToolMCP/main.swift      # Main program source
โ”œโ”€โ”€ test/python/rename_images_by_ocr.py # Python usage example
โ”œโ”€โ”€ README.md                          # English documentation
โ”œโ”€โ”€ README.zh.md                       # Chinese documentation
โ”œโ”€โ”€ LICENSE                            # MIT License
โ””โ”€โ”€ .gitignore

๐Ÿ‘จโ€๐Ÿ’ป Author

๐Ÿค Contributing

Issues and Pull Requests are welcome!

๐Ÿ“ License

MIT License

About

MCP OCR module using macOS Vision framework

Resources

License

Stars

Watchers

Forks

Packages

No packages published