Assistant, Voice, and TTS
The webserver includes an optional conversational assistant with text and voice interaction, plus a managed text-to-speech (TTS) voice library. These features are disabled by default and enabled through configuration.
Conversational Assistant
When enabled, the assistant supports text chat, spoken input, and streamed responses. It can optionally execute a curated set of robot tools so that spoken or typed requests translate into robot actions.
Supported backends:
Cloud assistants (selectable via
llm_backend)A local model served through Ollama (
llm_ollama_host/llm_ollama_model)
Tool execution is gated by llm_tools_enabled and a tool configuration file,
so the set of actions the assistant may perform is explicit and controlled.
Voice Workflow
The voice pipeline combines speech-to-text (STT), the assistant, and text-to-speech (TTS) into a spoken interaction loop:
An activation word starts listening.
Captured speech is transcribed and passed to the assistant.
The assistant response is spoken back through the configured TTS voice.
Offline STT assets are vendored with the package so basic voice features work without internet access.
Parameter |
Example |
|---|---|
|
|
|
|
TTS Library
The TTS library panel manages the set of saved voice clips used by the
assistant and the play_audio skill. From the panel you can:
List available voices and clips
Preview a voice
Save, delete, and play back clips
Clips are stored under the package audio library and are shared with the
play_audio skill described in Skills and Missions.
Integration Endpoints
Endpoint |
Purpose |
|---|---|
|
Assistant configuration. |
|
Chat interaction endpoints. |
|
Voice list, preview, save, delete, and playback. |
Note
The assistant, voice, and tool execution features are optional and disabled by default. Enable them only after configuring the relevant backend and reviewing the tool surface.