Add support for Elevenlabs conversational API
under review
Daniel Nguyen
under review
I haven't dived deep into their documentation yet. I figure it would work similarly to the current Advanced Voice Mode, right?
kernkraft
Daniel Nguyen it's WebSockets. It's actually pretty basic if you let 11labs handle the LLM part, but if you want to do custom LLM, it's only marginally harder. The complete conversational agent API can be updated by providing the JSON the same way you created. It's actually really, really simple if you can just set up the WebSocket connection. Also, it supports MCP through SSE. The solution is fully there; it's just no one's really capitalizing on voice agents. OpenAI is kind of sleeping on their own because it's not very smart. This is a big feature that no one's really utilizing. I use my 11 Labs agent every day; it's smart and it can use tools, and you're not restricted to a dumbed-down real-time model. I think it's a feature that people don't realize how valuable it is until they've used it.
kernkraft
Another thing people haven't really figured out, even though the solution is there - one limitation of voice agents like this is that it can't really output stuff for you to copy and paste like code artifacts or writing. It would be really cool if somehow this was able to be combined with structured outputs to where you can have the part of the output be what the LLM is saying (with voice) but then have an output that is like code blocks or artifacts, to where in the chat you can also receive actual code like you can use voice interactively but also receive copy and paste outputs