Imagine this scenario: You’re working with Claude or ChatGPT and suddenly receive an image containing important text information. The traditional approach is to manually copy the text from the image, or use a third-party OCR tool to recognize it first and then paste it into the conversation. This process is cumbersome and interrupts your workflow.
What if your AI assistant could directly “see” and understand the text in images?
That’s exactly what RapidOCR MCP Server aims to solve.
🤔 What is MCP? Why Does It Matter?
MCP (Model Context Protocol) is an open protocol launched by Anthropic, designed to standardize how AI assistants connect with external tools and data sources.
Simply put, MCP is like the “USB interface” for the AI world:
- Before: Each AI assistant had its own plugin system, incompatible with each other
- Now: Tools following the MCP protocol can be used by any AI assistant that supports MCP
This means once you’ve configured RapidOCR MCP Server, Claude, ChatGPT, Cursor, and other MCP-compatible AI assistants can all directly use its OCR capabilities.
🚀 Core Features of RapidOCR MCP
1. Multi-Mode Support, Flexible Deployment
RapidOCR MCP supports three running modes to adapt to different scenarios:
| Mode | Use Case | Launch Command |
|---|---|---|
| MCP stdio | Local AI assistants (e.g., Claude Desktop) | uvx rapidocr-mcp |
| FastAPI HTTP | Remote services, Web applications | uv run rapidocr-mcp --mode fastapi |
| streamable-http | Real-time streaming processing | Built-in support |
2. Multiple Input Methods, Handle Anything
No matter where your images are, RapidOCR MCP can process them:
- Local file paths: Directly read images from your computer
- Base64 encoding: Process image data embedded in messages
- URLs: Automatically download and recognize web images
- File upload: Upload image files via HTTP API
3. Batch OCR, Double the Efficiency
Need to process multiple images at once? The ocr_batch tool can batch recognize, significantly improving work efficiency.
4. Intelligent Image Preprocessing
Built-in image enhancement features to improve recognition accuracy:
- Auto-enhance: Adjust contrast and sharpness
- Auto-rotate: Fix image orientation based on EXIF information
- Binarization: Convert color/grayscale images to black and white for better text recognition
5. Multiple Output Formats
Choose the most suitable output format based on your use case:
- plain: Plain text, simplest and most direct
- json: Structured data with coordinates and confidence scores
- markdown: Suitable for document organization
- structured: Complete structured information, easy for programmatic processing
6. Enterprise-Grade Security and Monitoring
- Path whitelist: Restrict accessible file paths to prevent unauthorized access
- API key: Authentication for HTTP mode
- CORS support: Secure cross-origin configuration
- Audit logging: Record all OCR requests
- Prometheus metrics: Monitor request volume, latency, error rates
- OpenTelemetry tracing: Distributed tracing support
🛠️ Technical Architecture
Core Components
┌─────────────────────────────────────────────────────────────┐
│ RapidOCR MCP Server │
├─────────────────────────────────────────────────────────────┤
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ MCP Server │ │ FastAPI HTTP │ │ Registry │ │
│ │ (stdio) │ │ Server │ │ (Engines) │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └──────────────────┼──────────────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ OCR Service │ │
│ │ (Singleton) │ │
│ └────────┬─────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ RapidOCR Engine │ │
│ │ (ONNX Runtime) │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Why RapidOCR?
RapidOCR is a high-performance OCR engine based on ONNX Runtime, offering advantages over traditional OCR solutions:
- Fast: ONNX Runtime optimization brings extreme inference performance
- Accurate: Deep learning models provide excellent Chinese and English recognition
- Lightweight: Small model files, easy to deploy
- Cross-platform: Supports Windows, macOS, Linux
📦 Quick Start
Option 1: Global Installation (Recommended)
Run with uvx in one command, no manual dependency installation needed:
uvx rapidocr-mcp
Option 2: pip Installation
pip install rapidocr-mcp
rapidocr-mcp
Configure Claude Desktop
Edit the configuration file (Windows: %APPDATA%\Claude\claude_desktop_config.json):
{
"mcpServers": {
"rapidocr": {
"command": "uvx",
"args": ["rapidocr-mcp"]
}
}
}
After restarting Claude Desktop, you can directly ask Claude to recognize text in images!
💡 Usage Examples
Example 1: Recognize Local Image
Simply tell Claude:
“Please help me recognize the text in this image: /Users/myname/Documents/invoice.png”
Claude will automatically call the ocr_by_path tool and return the recognition results.
Example 2: Recognize Web Image
“What does this image say? https://example.com/document.jpg”
Claude will call the ocr_by_url tool to automatically download and recognize.
Example 3: Batch Processing
“Please help me recognize text from all images in this folder: /Users/myname/Documents/scanned/”
Claude will call the ocr_batch tool for batch processing.
🔧 Advanced Configuration
Customize behavior through environment variables:
# Set recognition language (ch=Chinese+English, en=English)
export RAPIDOCR_LANG=ch
# Set log level
export RAPIDOCR_LOG_LEVEL=INFO
# Enable API key authentication (HTTP mode)
export RAPIDOCR_API_KEY=your-secret-key
# Configure path whitelist
export RAPIDOCR_PATH_WHITELIST=/home/user/documents,/tmp
# Set maximum image size (bytes)
export RAPIDOCR_MAX_IMAGE_SIZE=10485760
🐳 Docker Deployment
For production environments, you can quickly deploy using Docker:
docker-compose -f docker/docker-compose.yml up
Once the service is running, you can access it via HTTP API:
curl -X POST http://localhost:8080/ocr/path \
-F "path=/app/sample.png" \
-F "output_format=json"
🎯 Use Cases
1. Personal Knowledge Management
Convert paper notes and book scans into searchable text, import into Obsidian, Notion, and other knowledge bases.
2. Automated Office Work
Batch process invoices, contracts, and reports, extract key information into spreadsheets or databases.
3. Development Assistance
Recognize code and error logs from screenshots, let AI assistants directly analyze and fix them.
4. Content Creation
Extract copy and data from images for writing, reports, and presentations.
5. Accessibility Support
Help visually impaired users “read” text content in images.
🔮 Future Roadmap
- Support for more OCR engines (Tesseract, PaddleOCR, etc.)
- Table recognition and structured output
- Handwriting recognition optimization
- Extended multilingual support
- Model quantization and edge device deployment
🤝 Contributing
RapidOCR MCP is an open-source project, and all forms of contribution are welcome:
- Submit Issues to report bugs or suggestions
- Submit Pull Requests to improve code
- Improve documentation and tutorials
- Share with more people who need it
Project URL: https://github.com/bitfarer/rapidocr-mcp
📄 License
MIT License - You are free to use, modify, and distribute this project.
Let AI truly “see” the world, starting with RapidOCR MCP.