Skip to content

OCR & document extraction

Coding work often touches non-code artifacts — scanned specs, screenshots, PDFs, diagrams with formulas. OCR/extraction tooling lets Pi pull structured text out of them and reason over it.

A multi-backend OCR extension. Extract text, formulas, and tables from images and PDFs, with three backends: MinerU (free cloud), Ollama (local GPU, including LaTeX formulas), and Pix2Text (local Python). Works zero-config by default.

Terminal window
pi install npm:pi-ocr

Benefits

  • Multiple backends — cloud or fully local, depending on privacy/compute needs.
  • Handles formulas and tables, not just plain text.
  • Zero-config out of the box.

Drawbacks

  • Local backends require GPU (Ollama) or a Python setup (Pix2Text).
  • Cloud backend sends documents off-machine — mind sensitive data.

If your documents are mostly PDFs reachable by URL, pi-web-access already includes PDF extraction alongside web search and fetching — one package for research and PDFs.

Terminal window
pi install npm:pi-web-access

Benefits

  • One install covers PDFs + web research.
  • No separate OCR stack to manage.

Drawbacks

  • PDF text extraction only — no image OCR, formulas, or tables.
  • Struggles with scanned (image-only) PDFs.

3. An MCP document server via pi-mcp-adapter

Section titled “3. An MCP document server via pi-mcp-adapter”

For specialized pipelines, run a dedicated document/OCR MCP server and connect it with pi-mcp-adapter.

Terminal window
pi install npm:pi-mcp-adapter

Benefits

  • Reuse best-in-class MCP document servers.
  • Swap servers without changing your Pi setup.

Drawbacks

  • You provision and run the MCP server yourself.
  • More setup than a turnkey extension.
  • Need real OCR (images, scans, formulas, tables): pi-ocr.
  • Only need text from URL-based PDFs: pi-web-access.
  • Have a preferred document pipeline already: wrap it as an MCP server and use pi-mcp-adapter.