Files
audio-chat/AGENTS.md
noturum 1edfd5d62f Initial commit: audio-chat with fixes
- Created AGENTS.md with architecture documentation
- Fixed race conditions and async patterns
- Added conversation history to LLM prompts
- Fixed TTS audio shape handling
- Added buffer limits and graceful shutdown
- Fixed client.py with file sending support
- Removed duplicate requirements
- Added .gitignore
2026-05-01 13:01:06 +00:00

1.7 KiB

Audio Chat

Running

uvicorn main:app --host 0.0.0.0 --port 8000

Client (for testing):

python client.py

Architecture

Single-process FastAPI server. On each WebSocket connection a new AudioSession is created with three engines:

Module Purpose Model (default)
engine/stt.py Speech-to-text Systran/faster-whisper-large-v3
engine/llm.py LLM response generation Qwen/Qwen2.5-7B-Instruct
engine/tts.py Text-to-speech facebook/mms-tts-rus

Models are loaded lazily on first use if initialize() was not called. STT always runs in Russian (language="ru" with VAD).

WebSocket Protocol

Direction Format Meaning
Client → Server b"A" + PCM data Send audio chunk
Client → Server b"R" Reset conversation
Server → Client b"O" + WAV bytes LLM response as audio
Server → Client "TEXT:<transcription>" Recognized speech

Audio format: 16-bit PCM mono, 16 kHz input / 24 kHz output.

Configuration

All settings via .env (loaded by config.py). Key vars:

  • DEVICE"cuda" or "cpu" (default "auto")
  • AUDIO_BUFFER_SECONDS / CHUNK_SIZE — silence detection thresholds
  • LLM_MAX_TOKENS / LLM_TEMPERATURE — generation parameters

Gotchas

  • No test suite or linting configured.
  • Models download on first use; ensure network access to HuggingFace.
  • AudioSession holds conversation history (last 6 turns) in memory — each WebSocket reconnect resets it.
  • Thread pool executor is fixed at 2 workers; concurrent heavy requests will queue.
  • TTS pipeline falls back to CPU (device=-1) if GPU initialization fails silently.