Multi-Agent Unified Dispatch Engine

MAUDE

NemotronCodestralClaudeMistralDevstralLLaVAWhisperTailscaleDockerPythonReactCapacitor

Setup Guide Documentation GitHub Privacy Policy Terms of Service

MAUDE is a full-stack AI operating environment running on the NVIDIA DGX Spark. It coordinates 10+ specialized models through a unified gateway, routing between local inference (Nemotron), cloud code generation (Codestral, Devstral), vision analysis (LLaVA), and frontier reasoning (Claude, Mistral) based on task requirements.

Over 100 tools span file operations, web browsing, Google Workspace (Gmail, Drive, Sheets, Calendar, Slides, Contacts, YouTube), GitHub (PRs, issues, CI/CD, releases), browser automation, image generation, social media posting, and system monitoring. Tools are dynamically filtered per message to minimize token usage.

The system runs across five client interfaces: a server TUI on the Spark, a pip-installable Mac/PC/Linux CLI, a native iOS & Android app, a Telegram bot, and a web dashboard. All connected over Tailscale VPN through a single gateway that handles SSE streaming, WebSocket terminal/voice proxying, file transfers, and tool execution loops.

Gallery

MAUDE Feature Video

MAUDE Server TUI

MAUDE Client on Mac

Case Study

The Problem

Every commercial AI assistant locks you into one model, one provider, and their cloud. If you want Claude's reasoning, Mistral's speed, and a local model for privacy, you need three different apps, three billing accounts, and no shared context between them. For a power user who wants AI integrated into their actual workflow (files, email, calendar, code, devices), the gap between what chatbots offer and what's needed is enormous.

Design Challenge

How do you build a single AI interface that spans 10+ models, 100+ tools, five client surfaces (desktop TUI, CLI, mobile app, Telegram bot, web dashboard), and multiple physical machines without the complexity leaking through to the user? The user should say what they want and the system should figure out which model, which tools, and which machine to use.

Key Design Decisions

The central architectural choice was a unified gateway that absorbs all complexity. Every client connects to one endpoint. The gateway resolves model aliases, translates between API formats (OpenAI vs. Anthropic), executes tools server-side, manages context windows, and streams results back with optional trace visibility. Tool selection is dynamic: keyword filtering on each message activates only relevant tools, keeping token usage 30-40% lower than sending the full 100+ tool catalog. For the mobile app, I designed collapsible tool execution traces. Power users can see exactly what the AI is doing (which tools it called, what it found, how long it took), while casual users see only the final response.

Outcome

MAUDE is my daily-driver AI system, running 24/7 on a DGX Spark with clients on Mac, iPhone, and Windows. It handles email triage, calendar management, code generation, file operations, web research, image generation, social media posting, and cross-machine task dispatch. The autonomous builder (Forge) can scaffold and deploy complete web applications in Docker sandboxes with zero human intervention.

Development Roadmap

MAUDE is evolving from a powerful local assistant into a durable agent operating system: one gateway, multiple clients, persistent missions, safer autonomous execution, and clearer observability across the whole stack.

Foundation

Shipped

Unified Local AI Gateway

Consolidate local and cloud models behind one OpenAI-compatible gateway with alias routing, streaming responses, context-aware model selection, and a shared tool execution layer for every client.

Interface Layer

Shipped

Five Client Surfaces

Run MAUDE from the Spark TUI, desktop CLI, native mobile app, Telegram, and web Command Center while preserving one backend, one model catalog, and one shared operational view.

Automation

In Progress

Missions and Recurring Agents

Turn one-off prompts into durable missions with schedules, checkpoints, next actions, artifacts, blockers, and self-healing execution plans for recurring creative and operational workflows.

Reliability

Verification and Recovery Loops

Expand automatic verification for generated code, published content, browser actions, social posts, and scheduled tasks, with explicit retry paths and clearer failure summaries.

Observability

Command Center Deepening

Expose richer traces for model routing, tool calls, scheduler runs, mission progress, GPU load, memory usage, cross-device presence, and long-running jobs from one operator dashboard.

Distribution

Planned

Installable Personal Agent OS

Package setup, skills, OAuth configuration, service management, mobile pairing, and documentation so MAUDE can move from a personal DGX Spark system toward a repeatable local-first AI stack.

System Architecture

Technical Highlights

Multi-Model Gateway

A Python gateway on port 30000 routes all traffic: 10+ LLM models (Mistral, Codestral, Devstral, Nemotron, Claude Opus, Claude Sonnet, LLaVA), WebSocket proxying for SSH terminal and voice, file transfers, tool execution loops, and SSE streaming with real-time pipeline traces. Models are resolved via short aliases with per-model context window awareness.

100+ LLM-Callable Tools

File operations, shell execution, web search & browsing, image generation (FLUX + LoRA), vision analysis, browser automation (Playwright), and AI delegation to frontier models. Dynamic keyword-based tool filtering reduces context window usage by 30-40% per request.

Google Workspace Integration

30+ tools for Gmail (read, compose, send), Drive (search, upload, create docs/sheets/folders), Sheets (read, write, append), Calendar (events, search), Slides (create, edit), Contacts (CRUD, search), and YouTube (search, playlists, comments). Full OAuth 2.0 with verified domain

GitHub Integration

25+ tools for pull requests (create, review, merge, diff, comment), issues (create, close, comment), repositories, branches, commits, CI/CD workflow runs (view, re-run), releases, code search, and notifications.

Forge — Autonomous Builder

Plan → Execute → Verify → Fix loop with mandatory verification. Builds software autonomously in a Docker sandbox (Ubuntu 24.04, Python 3.12, Node 22, Go 1.22). Automatic model escalation from free local Nemotron to Codestral to Mistral. Budget-capped token tracking with blueprint templates for web apps, APIs, AI tools, and SaaS MVPs.

Cross-Machine Collaboration

Dispatch tasks to remote clients over Tailscale mesh. Presence heartbeats track online devices with activity snapshots. Project management with task assignment, file sharing, and cross-machine shell execution. Target resolution by client ID, hostname, or platform name.

Voice Mode

Speech-to-speech via Nemotron ASR (0.6B) for transcription and Magpie TTS (357M) for synthesis. Opus-encoded audio input over WebSocket (24kHz), PCM float32 output (24kHz). Binary protocol: 0x00 handshake, 0x01 audio-in, 0x02 text, 0x03 audio-out. Continuous listening mode with camera capture during voice calls for image context via LLaVA.

Persistent Memory

SQLite-backed memory with semantic search via nomic-embed-text embeddings. Stores facts, preferences, people, tasks, and conversation context across sessions. Automatic memory injection into prompts based on relevance. Access counting tracks memory importance over time.

Five Client Interfaces

Server TUI (Textual) on the Spark, pip-installable Mac/PC/Linux CLI with braille spinner and trace visualization, native iOS & Android app (React + Capacitor) with voice/terminal/browser/file manager, Telegram bot with full tool access, and a web-based Command Center dashboard.

Command Center

Real-time system monitoring module: CPU, RAM, disk, GPU temperature and utilization, VRAM usage by process, conversation sessions across channels, activity feed, scheduled task status, and mesh node health. Available as both LLM-callable tools and a dedicated mobile app view with auto-refresh.

Scheduled Tasks & Agents

Cron-based task scheduler with natural language parsing, persistent storage, and automatic execution. Specialized subagents (code, research, writer, reasoning, search) can be dispatched in parallel with tool inheritance and result aggregation.

Security & Sandbox Isolation

Docker containerization for autonomous builds with resource limits (8GB RAM, 4 CPUs, 256 PID max). Client allowlist via authorized.json. Tailscale VPN for encrypted mesh networking. Local-first inference keeps data on-device. Optional cloud escalation with explicit delegation.

Browser Automation

Playwright-based headless browser with 9 tools: open, navigate, click, type, fill forms, select dropdowns, screenshot, extract content, and close. Persistent session data with 5-minute inactivity timeout.

Social Media & Publishing

Post to Twitter/X, LinkedIn, and Bluesky with image attachments and platform-specific formatting. Substack integration for draft creation, editing, publishing, and statistics tracking. Rate limit enforcement per platform.

Image Generation

FLUX image generation via ComfyUI with LoRA support (Stillion style, marker-mech style). Configurable prompt, dimensions, seed, and steps. Generated images delivered to the shared folder with markdown preview. Mesh router auto-discovers ComfyUI across the network.

Best Practice Guides

10 markdown reference guides (coding, website design, graphic design, color theory, writing, API design, prompt engineering, image generation, cybersecurity, web UI/UX patterns) automatically injected into the system prompt based on keyword triggers in the user’s message. Max 2 guides per turn to keep context focused. Gives the model domain expertise on demand without permanent context bloat.

Skills Plugin System

Python-based plugin framework with @skill decorator registration. Each skill is exposed as an OpenAI-compatible tool with typed parameters and JSON schema. Built-in skills include weather, calculator, stocks, screenshots, datetime utilities, notes, and system info. User-installable skills via ~/.config/maude/skills/. Enable, disable, and reload at runtime via /skills command.