Claude Sonnet 4.6: Extended Thinking Streams 2x Faster
Extended thinking in claude-sonnet-4-6 now streams at 2x the previous throughput. Internal reasoning tokens arrive in real-time via the thinking content block, with no additional latency penalty on the first token.
For any app using extended thinking — code review, multi-step reasoning, complex planning — the UX dramatically improves. Users see Claude working through problems as it happens, not waiting for a wall of text to appear.
Pass budget_tokens in the thinking parameter alongside stream: true. The stream emits thinking blocks first, then text blocks. Parse content_block_delta events where type is 'thinking' to render the internal monologue separately.