PocketTTS-MCP logo

The Problem

AI agents that need to generate speech are forced to depend on cloud APIs — proprietary models, rate limits, privacy concerns, and vendor lock-in. When building agentic workflows that need real-time TTS, depending on a remote API is a fragile foundation.

The Solution

PocketTTS-MCP is a Model Context Protocol server that wraps Kyutai Labs' open-source Pocket TTS model. It exposes TTS generation through the MCP protocol, allowing AI agents and tools like Cursor, Claude Desktop, and n8n to call local neural TTS as a native tool — no API key required.

Technical Implementation

The server is built with FastMCP and Python, loading the Pocket TTS GGUF model directly. It handles voice selection, text tokenization, and audio synthesis entirely on the local machine. The MCP interface maps cleanly to the underlying model, exposing voices, synthesis parameters, and audio format options. Runs on CPU or GPU with automatic hardware detection.

FastMCPPocket TTSPythonGGUFMCP