Streaming Claude Responses in Next.js with the Vercel AI SDK

Waiting two seconds for an AI response to appear all at once feels slow. Watching it stream in word by word feels fast — even if the total time is identical. Streaming isn't just a UX improvement; it's the difference between an app that feels alive and one that feels broken.

This tutorial wires up Claude streaming in a Next.js App Router project using the Vercel AI SDK.

What You'll Build

A Next.js route that streams Claude responses to the client as they're generated. The front-end renders tokens as they arrive with no full-page refresh, using React state and the Vercel AI SDK's useChat hook.

Setup

Install the dependencies:

pnpm add ai @anthropic-ai/sdk

Set your API key in .env.local:

ANTHROPIC_API_KEY=sk-ant-...

The API Route

Create app/api/chat/route.ts. This is the streaming endpoint:

import Anthropic from '@anthropic-ai/sdk'
import { AnthropicStream, StreamingTextResponse } from 'ai'

const client = new Anthropic()

export async function POST(req: Request) {
  const { messages } = await req.json()

  const response = await client.messages.create({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    stream: true,
    messages,
  })

  const stream = AnthropicStream(response)
  return new StreamingTextResponse(stream)
}

AnthropicStream adapts Claude's native stream format to the format the Vercel AI SDK expects. StreamingTextResponse wraps it in a proper HTTP streaming response with the right headers (Content-Type: text/plain; charset=utf-8, Transfer-Encoding: chunked).

The Chat Component

'use client'

import { useChat } from 'ai/react'

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
  })

  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
      <div className="flex-1 overflow-y-auto space-y-4 pb-4">
        {messages.map(m => (
          <div
            key={m.id}
            className={`p-3 rounded-lg ${
              m.role === 'user'
                ? 'bg-blue-100 ml-8'
                : 'bg-gray-100 mr-8'
            }`}
          >
            <p className="text-sm font-semibold capitalize mb-1">{m.role}</p>
            <p className="whitespace-pre-wrap">{m.content}</p>
          </div>
        ))}
        {isLoading && (
          <div className="bg-gray-100 mr-8 p-3 rounded-lg">
            <p className="text-sm font-semibold mb-1">assistant</p>
            <p className="text-gray-400">Thinking…</p>
          </div>
        )}
      </div>

      <form onSubmit={handleSubmit} className="flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask something…"
          className="flex-1 border rounded-lg px-3 py-2 focus:outline-none focus:ring-2"
          disabled={isLoading}
        />
        <button
          type="submit"
          disabled={isLoading || !input.trim()}
          className="bg-blue-500 text-white px-4 py-2 rounded-lg disabled:opacity-50"
        >
          Send
        </button>
      </form>
    </div>
  )
}

useChat handles the entire lifecycle: sending the request, accumulating streamed tokens into messages, and toggling isLoading. You get a fully functional streaming chat interface with about 50 lines of component code.

Streaming for Non-Chat Use Cases

Not every streaming use case is a chat interface. For a one-shot generation (e.g., blog post generation, code explanation), use useCompletion instead:

'use client'

import { useCompletion } from 'ai/react'

export default function GeneratorPage() {
  const { completion, input, handleInputChange, handleSubmit, isLoading } =
    useCompletion({ api: '/api/generate' })

  return (
    <form onSubmit={handleSubmit} className="space-y-4">
      <textarea
        value={input}
        onChange={handleInputChange}
        placeholder="Describe the post you want to generate…"
        className="w-full border rounded-lg p-3 h-32"
      />
      <button type="submit" disabled={isLoading}>
        {isLoading ? 'Generating…' : 'Generate'}
      </button>
      {completion && (
        <div className="prose max-w-none mt-4 whitespace-pre-wrap">
          {completion}
        </div>
      )}
    </form>
  )
}

The API route for this is identical in structure — just a different prompt.

Handling System Prompts

Pass a system prompt from the API route side, not from the client. This keeps your instructions server-side and prevents prompt injection from the browser:

export async function POST(req: Request) {
  const { messages } = await req.json()

  const response = await client.messages.create({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    stream: true,
    system: 'You are a technical writing assistant. Be concise and precise.',
    messages,
  })

  const stream = AnthropicStream(response)
  return new StreamingTextResponse(stream)
}

Key Takeaways

AnthropicStream + StreamingTextResponse from the Vercel AI SDK handle the streaming plumbing; you just pass Claude's native stream
useChat gives you a complete chat interface with one hook; useCompletion covers single-turn generation
Keep system prompts in the API route — never trust the client with prompt construction
Streaming doesn't reduce time-to-complete; it reduces perceived latency, which matters more for UX