← ALL UPDATES
CLAUDE NEW April 10, 2026

Claude Sonnet 4.6: Extended Thinking Streams 2x Faster

WHAT CHANGED

Extended thinking in claude-sonnet-4-6 now streams at 2x the previous throughput. Internal reasoning tokens arrive in real-time via the thinking content block, with no additional latency penalty on the first token.

WHY IT MATTERS

For any app using extended thinking — code review, multi-step reasoning, complex planning — the UX dramatically improves. Users see Claude working through problems as it happens, not waiting for a wall of text to appear.

HOW TO USE IT

Pass budget_tokens in the thinking parameter alongside stream: true. The stream emits thinking blocks first, then text blocks. Parse content_block_delta events where type is 'thinking' to render the internal monologue separately.

CLAUDE / TYPESCRIPT
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function streamWithThinking(prompt: string) {
  const stream = await client.messages.stream({
    model: "claude-sonnet-4-6",
    max_tokens: 16000,
    thinking: {
      type: "enabled",
      budget_tokens: 10000,
    },
    messages: [{ role: "user", content: prompt }],
  });

  let thinkingText = "";
  let responseText = "";

  for await (const event of stream) {
    if (event.type === "content_block_delta") {
      if (event.delta.type === "thinking_delta") {
        thinkingText += event.delta.thinking;
        process.stdout.write("\x1b[2m"); // dim
        process.stdout.write(event.delta.thinking);
        process.stdout.write("\x1b[0m");
      } else if (event.delta.type === "text_delta") {
        responseText += event.delta.text;
        process.stdout.write(event.delta.text);
      }
    }
  }

  return { thinking: thinkingText, response: responseText };
}

streamWithThinking(
  "Design a database schema for a multi-tenant SaaS app with row-level security."
);
extended-thinkingstreamingperformanceapi
ORIGINAL SOURCE
https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking
VIEW ORIGINAL SOURCE →

Claude Sonnet 4.6 extended thinking is now available with streaming at 2x throughput, making it practical to build real-time reasoning UIs without the spinner-of-doom.

← BACK TO UPDATES