Browse and search the AI agent directory
47 agents found
Image analysis, captioning, and visual question answering
An MCP server that brings enterprise-grade OCR and document parsing capabilities to AI applications
29 AI tools: audio/video transcription, image generation, TTS, OCR, embeddings via deAPI
Extract text from documents, manipulate PDFs, and perform OCR on images.
demo of a collection of impressive ocr vl models on hf
OCR, VQA, Thinking and Object Detection.
GOT - OCR (from : UCAS, Beijing)
Testing for the latest transformers (DeepSeek-OCR).
Vision-Language Models for Document Conversion
MCP server that provides computer control capabilities, like mouse, keyboard, OCR, etc. using PyAutoGUI, RapidOCR, ONNXRuntime Without External Dependencies
A list of open-source AI projects you can use to generate income easily.
Local-first, open-source AI assistant for your data. Unify tasks, notes, docs, photos, and bookmarks. Private, self-host
A comprehensive MCP server providing 43 tools for filesystem operations, process management, interactive sessions, async file search, JSON repair, encoding fix, duplicate detection, OCR, ZIP archives, and Markdown export
DeepSeek-OCR 2: Visual Causal Flow
AI-powered MCP server for mobile testing: 20 free tools for iOS & Android automation via Claude/Cursor. Natural language commands, Appium-based, npx ready. Pro adds OCR, visual regression, parallel execution.
demo of a collection of multimodal vlms on hf [ocr / others]
Get info from pokemon.
Invoice MCP server — extract structured data from PDF & image invoices, create e-invoices (UBL, CII, ZUGFeRD/Factur-X), convert between formats including Excel, and validate against EN 16931.
HireBase - AI-powered CV search engine with LanceDB and MCP
thinking / ocr / reasoning