Claude Sonnet 4.6: Extended Thinking Streams 2x Faster
Extended thinking in claude-sonnet-4-6 now streams at 2x the previous throughput. Internal reasoning tokens arrive in real-time via the thinking content block, with no additional latency penalty on the first token.
For any app using extended thinking — code review, multi-step reasoning, complex planning — the UX dramatically improves. Users see Claude working through problems as it happens, not waiting for a wall of text to appear.
Pass budget_tokens in the thinking parameter alongside stream: true. The stream emits thinking blocks first, then text blocks. Parse content_block_delta events where type is 'thinking' to render the internal monologue separately.
Claude Sonnet 4.6 extended thinking is now available with streaming at 2x throughput, making it practical to build real-time reasoning UIs without the spinner-of-doom.