Add Kokoro TTS as a built-in speech engine in Bolt AI 2, alongside OpenAI audio. It enables low-latency, high-quality, local text-to-speech for LLM responses.
Motivation
Kokoro TTS is fast and lightweight, ideal for real-time responses streaming as LLM outputs tokens.
Many users prefer local speech for privacy, offline use, and cost savings, instead of cloud APIs.
Making Kokoro a core, selectable backend enhances Bolt AI 2 for multimodal, voice-first workflows.
Functionality
  • Kokoro as a backend in audio/TTS settings.
  • Configure model, voice, speed, and response options.
  • Support streaming: send chunks incrementally, start playback early, create real-time speech.
  • Support local process or remote endpoint with URL, API key, request mapping.
  • UI controls: "Play with Kokoro” button, indicating speech status, voice dropdown.
Examples
  • Global: set audio provider, endpoint, default voice, speed.
  • Per assistant: enable auto-speak and streaming.
Benefits
  • Private local voice for sensitive use.
  • Low-latency, real-time interactions.
  • Cost savings with local compute.
  • Fits power users with local models and self-hosted AI.