# Audio Chat ## Running ```bash uvicorn main:app --host 0.0.0.0 --port 8000 ``` Client (for testing): ```bash python client.py ``` ## Architecture Single-process FastAPI server. On each WebSocket connection a new `AudioSession` is created with three engines: | Module | Purpose | Model (default) | |--------|---------|-----------------| | `engine/stt.py` | Speech-to-text | Systran/faster-whisper-large-v3 | | `engine/llm.py` | LLM response generation | Qwen/Qwen2.5-7B-Instruct | | `engine/tts.py` | Text-to-speech | facebook/mms-tts-rus | Models are loaded lazily on first use if `initialize()` was not called. STT always runs in Russian (`language="ru"` with VAD). ## WebSocket Protocol | Direction | Format | Meaning | |-----------|--------|---------| | Client → Server | `b"A" + PCM data` | Send audio chunk | | Client → Server | `b"R"` | Reset conversation | | Server → Client | `b"O" + WAV bytes` | LLM response as audio | | Server → Client | `"TEXT:"` | Recognized speech | Audio format: 16-bit PCM mono, 16 kHz input / 24 kHz output. ## Configuration All settings via `.env` (loaded by `config.py`). Key vars: - `DEVICE` — `"cuda"` or `"cpu"` (default `"auto"`) - `AUDIO_BUFFER_SECONDS` / `CHUNK_SIZE` — silence detection thresholds - `LLM_MAX_TOKENS` / `LLM_TEMPERATURE` — generation parameters ## Gotchas - No test suite or linting configured. - Models download on first use; ensure network access to HuggingFace. - `AudioSession` holds conversation history (last 6 turns) in memory — each WebSocket reconnect resets it. - Thread pool executor is fixed at 2 workers; concurrent heavy requests will queue. - TTS pipeline falls back to CPU (`device=-1`) if GPU initialization fails silently.