Assistant, Voice, and TTS

The webserver includes an optional conversational assistant with text and voice interaction, plus a managed text-to-speech (TTS) voice library. These features are disabled by default and enabled through configuration.

Conversational Assistant

When enabled, the assistant supports text chat, spoken input, and streamed responses. It can optionally execute a curated set of robot tools so that spoken or typed requests translate into robot actions.

Supported backends:

Cloud assistants (selectable via llm_backend)
A local model served through Ollama (llm_ollama_host / llm_ollama_model)

Tool execution is gated by llm_tools_enabled and a tool configuration file, so the set of actions the assistant may perform is explicit and controlled.

Voice Workflow

The voice pipeline combines speech-to-text (STT), the assistant, and text-to-speech (TTS) into a spoken interaction loop:

An activation word starts listening.
Captured speech is transcribed and passed to the assistant.
The assistant response is spoken back through the configured TTS voice.

Offline STT assets are vendored with the package so basic voice features work without internet access.

Parameter	Example
`llm_tts_voice`	`en-GB-RyanNeural` (online voice)
`llm_tts_offline_voice`	`en-gb` (offline fallback)

TTS Library

The TTS library panel manages the set of saved voice clips used by the assistant and the play_audio skill. From the panel you can:

List available voices and clips
Preview a voice
Save, delete, and play back clips

Clips are stored under the package audio library and are shared with the play_audio skill described in Skills and Missions.

Integration Endpoints

Endpoint	Purpose
`/api/llm/config`	Assistant configuration.
`/chat` (text/audio/stream)	Chat interaction endpoints.
`/api/tts_library`	Voice list, preview, save, delete, and playback.

Note

The assistant, voice, and tool execution features are optional and disabled by default. Enable them only after configuring the relevant backend and reviewing the tool surface.