CHATGPT February 10, 2026

OpenAI Realtime API Supports Text and Audio in Same Session

WHAT CHANGED

OpenAI's Realtime API now supports mixing text and audio modalities in the same WebSocket session. Send text, receive audio, or switch modes mid-conversation.

WHY IT MATTERS

Build voice assistants with text fallback, or multimodal apps where users switch between typing and speaking without breaking the session.

HOW TO USE IT

Connect to the Realtime API WebSocket and specify both text and audio in the modalities array of your session config.

CHATGPT / JAVASCRIPT

const ws = new WebSocket(
  "wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview",
  {
    headers: {
      "Authorization": `Bearer ${process.env.OPENAI_API_KEY}`,
      "OpenAI-Beta": "realtime=v1"
    }
  }
);

ws.on("open", () => {
  ws.send(JSON.stringify({
    type: "session.update",
    session: {
      modalities: ["text", "audio"],
      voice: "alloy"
    }
  }));
});

ws.on("message", (data) => {
  const event = JSON.parse(data.toString());
  if (event.type === "response.audio.delta") {
    // Handle audio chunk
  }
  if (event.type === "response.text.delta") {
    process.stdout.write(event.delta);
  }
});

realtimeaudiowebsocketvoice

ORIGINAL SOURCE

https://openai.com/index/introducing-the-realtime-api/

VIEW ORIGINAL SOURCE →

Mixed modality sessions remove a major friction point in voice app development. Previously switching between text and audio required separate API calls or session restarts. Now a single persistent WebSocket handles both, simplifying the architecture of any voice-enabled product significantly.

← BACK TO UPDATES