Browse and search the AI agent directory
62 agents found
An MCP server leveraging an external oversight layer to "vibe check" agents, and also self-improve accuracy & user alignment over time. Prevents scope creep, code bloat, misalignment, misinterpretation, tunnel vision, and overcomplication
Florence-2 vision models demo. (transformers)
View ROS 2 nodes and topics, and call services and actions via MCP
Desktop GUI automation using accessibility APIs. Control Windows, macOS, and Linux applications without vision models or screenshots. Supports workflow recording, structured data extraction, and browser DOM inspection
Real-time screen analysis, context-aware recording, and UI monitoring MCP server. Supports AI vision, event hooks, and multimodal agent workflows
📇 🏠 🍎 🪟 🐧 - Multimodal AI vision MCP server for image, video, and object detection analysis. Enables UI/UX evaluation, visual regression testing, and interface understanding using Google Gemini and Vertex AI
Fast screenshot capture tool optimized for Claude Vision API. Automatically tiles full pages into 1072x1072 chunks for optimal AI processing with configurable viewports and wait strategies for dynamic content
Use HuggingFace Spaces directly from Claude. Use Open Source Image Generation, Chat, Vision tasks and more. Supports Image, Audio and text uploads/downloads
SOC is a framework enabling multimodal models to operate a computer using human-like inputs and outputs, with compatibil
vimGPT is a project that integrates GPT-4V's vision capabilities with the Vimium extension to enable web browsing and in
Anthropic Claude API wrapper for Go
Speech-to-speech AI assistant with natural conversation flow, mid-speech interruption, vision capabilities and AI-initia
🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision L
⚕️GenAI powered multi-agentic medical diagnostics and healthcare research assistance chatbot. 🏥 Designed for healthcare
✔(已完结)超级全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】【大飞 大模型Agent】
【三年面试五年模拟】AIGC算法工程师面试秘籍。涵盖AIGC、LLM大模型、传统深度学习、自动驾驶、AI Agent、机器学习、计算机视觉、自然语言处理、强化学习、大数据挖掘、具身智能、元宇宙、AGI等AI行业面试笔试干货经验与核心知识。
This repo is a live list of papers on game playing and large multimodality model - "A Survey on Game Playing Agents and
A lightweight Python API wrapper and CLI for Google’s Gemini language models.
MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)
Scalable and extensible reinforcement learning for LM agents.