Browse and search the AI agent directory
62 agents found
Analyze images from multiple angles to extract detailed insights or quick summaries. Describe visu…
PGDrive: an open-ended driving simulator with infinite scenes from procedural generation
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer ta
🗣️ Real‑time, low‑latency voice, vision, and conversational‑memory AI assistant built on LiveKit and local LLMs ✨
Superfast AI decision making and intelligent processing of multi-modal data.
Vision-Language Models for Document Conversion
Translate PDFs in GDrive.Preserves layouts, translate images
[CVPR2024 Highlight] Editable Scene Simulation for Autonomous Driving via LLM-Agent Collaboration
[CVPR 2025 🔥]A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Code repo for the paper: Attacking Vision-Language Computer Agents via Pop-ups
High-quality screenshot capture optimized for Claude Vision API. Automatically tiles full pages into 1072x1072 chunks (1.15 megapixels) with configurable viewports and wait strategies for dynamic content
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize
API to run VirtualHome, a Multi-Agent Household Simulator
Try openai assistant api apps on Google Colab for free. Awesome assistant API Demos!
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural langu
DB-GPT WebUI,LLM to vision.
Interactive LLM Powered NPCs, is an open-source project that completely transforms your interaction with non-player char
An MCP server for Lucidchart and Lucidspark: connect, search, and obtain text representations of your Lucid documents and diagrams via LLM - driven AI Vision analysis. [npm](https://www.npmjs.com/package/lucid-mcp-server)
BJJ video analysis — YOLO pose detection, AI technique analysis, and highlight reels.