CLAUDE March 28, 2026

Extended Thinking Now Streams in Real-Time

WHAT CHANGED

Extended thinking mode in Claude now streams thinking tokens in real-time. Previously the full thinking block was buffered before delivery.

WHY IT MATTERS

Users see the model reasoning live instead of staring at a blank screen. Perceived latency drops dramatically for long reasoning tasks.

HOW TO USE IT

Enable streaming with thinking in your API call and handle content_block_delta events for thinking type blocks.

CLAUDE / TYPESCRIPT

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic();

const stream = await anthropic.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 16000,
  thinking: { type: "enabled", budget_tokens: 10000 },
  messages: [{ role: "user", content: prompt }]
});

for await (const event of stream) {
  if (event.type === "content_block_delta") {
    if (event.delta.type === "thinking_delta") {
      process.stdout.write(event.delta.thinking);
    }
    if (event.delta.type === "text_delta") {
      process.stdout.write(event.delta.text);
    }
  }
}

thinkingstreamingapi

ORIGINAL SOURCE

https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking

VIEW ORIGINAL SOURCE →

Streaming extended thinking is a major UX unlock. Instead of waiting 10–30 seconds for a response to appear, users watch the reasoning unfold in real time. This is particularly valuable for complex coding and analysis tasks where the thinking process itself is informative.

← BACK TO UPDATES