CHATGPT February 10, 2026
OpenAI Realtime API Supports Text and Audio in Same Session
WHAT CHANGED
OpenAI's Realtime API now supports mixing text and audio modalities in the same WebSocket session. Send text, receive audio, or switch modes mid-conversation.
WHY IT MATTERS
Build voice assistants with text fallback, or multimodal apps where users switch between typing and speaking without breaking the session.
HOW TO USE IT
Connect to the Realtime API WebSocket and specify both text and audio in the modalities array of your session config.
realtimeaudiowebsocketvoice
ORIGINAL SOURCE
Mixed modality sessions remove a major friction point in voice app development. Previously switching between text and audio required separate API calls or session restarts. Now a single persistent WebSocket handles both, simplifying the architecture of any voice-enabled product significantly.