Streaming AI Content and Real-Time Markdown Rendering on the Web

Streaming AI Content and Real-Time Markdown Rendering on the Web#

In the AI era, response speed expectations keep rising. Traditional “wait for full response” UX no longer works well. I recently implemented this in a project and hit many pitfalls, so here is a practical guide.

What is streaming?#

Streaming means the server continuously pushes data to the client, and the client processes and renders it in real time without waiting for the full payload. Compared with classic request-response, streaming lets users see generation immediately.

Core tech stack#

1. Server-Sent Events (SSE)#

A lightweight protocol for one-way server-to-client push:

1
const response = await fetch('/api/chat-stream', {
2
  method: 'POST',
3
  headers: { 'Content-Type': 'application/json' },
4
  body: JSON.stringify({ question: 'hello' })
5
});

2. ReadableStream API#

Native browser API for streaming reads:

1
const reader = response.body?.getReader();
2
const decoder = new TextDecoder();
3

4
while (true) {
5
  const { done, value } = await reader.read();
6
  if (done) break;
7
  const chunk = decoder.decode(value, { stream: true });
8
  processChunk(chunk);
9
}

Implementation steps#

Step 1: Backend streaming API#

Backend returns SSE format (data: per line, ended by \n\n):

1
// Node.js backend example
2
app.post('/api/generate-content-stream', async (req, res) => {
3
  res.writeHead(200, {
4
    'Content-Type': 'text/event-stream',
5
    'Cache-Control': 'no-cache',
6
    'Connection': 'keep-alive'
7
  });
8

9
  // Simulate streamed token generation
10
  const content = await generateContent(req.body);
11

12
  for (const token of content) {
13
    const data = JSON.stringify({ content: token });
14
    res.write(`data: ${data}\n\n`);
15
  }
16

17
  res.write('data: [DONE]\n\n');
18
  res.end();
19
});

Step 2: Frontend stream consumption#

1
async function streamGenerateContent(
2
  params: GenerateParams,
3
  onChunk: (content: string) => void
4
) {
5
  const response = await fetch('/api/generate-content-stream', {
6
    method: 'POST',
7
    body: JSON.stringify(params)
8
  });
9

10
  const reader = response.body?.getReader();
11
  const decoder = new TextDecoder();
12
  let buffer = '';
13

14
  while (true) {
15
    const { done, value } = await reader!.read();
16
    if (done) break;
17

18
    buffer += decoder.decode(value, { stream: true });
19
    const lines = buffer.split('\n');
20
    buffer = lines.pop() || '';
21

22
    for (const line of lines) {
23
      if (line.startsWith('data: ')) {
24
        const data = line.slice(6);
25
        if (data && data !== '[DONE]') {
26
          try {
27
            const parsed = JSON.parse(data);
28
            if (parsed.content) {
29
              // Accumulate content (append, do not replace)
30
              onChunk(parsed.content);
31
            }
32
          } catch (e) {
33
            console.error('JSON parse failed:', e);
34
          }
35
        }
36
      }
37
    }
38
  }
39
}

Step 3: Real-time Markdown rendering#

Use Streamdown for incremental Markdown rendering:

1
npm install streamdown
2
import { Streamdown } from 'streamdown';
3

4
function ContentReader({ content, isLoading }) {
5
  return (
6
    <div className="markdown-content">
7
      {isLoading ? (
8
        <div>Generating...</div>
9
      ) : (
10
        <Streamdown>{content}</Streamdown>
11
      )}
12
    </div>
13
  );
14
}

Why Streamdown?

✅ Handles incomplete Markdown fragments
✅ Supports incremental rendering
✅ Gracefully handles unclosed syntax

Step 4: State management and incremental updates#

1
function ChatPanel() {
2
  const [messages, setMessages] = useState<Message[]>([]);
3
  const [isStreaming, setIsStreaming] = useState(false);
4

5
  const sendMessage = async (question: string) => {
6
    // Create user message
7
    const userMsg: Message = { id: Date.now(), type: 'user', content: question };
8
    setMessages(prev => [...prev, userMsg]);
9

10
    // Create AI placeholder message
11
    const aiMsgId = `ai-${Date.now()}`;
12
    const aiMsg: Message = { id: aiMsgId, type: 'ai', content: '' };
13
    setMessages(prev => [...prev, aiMsg]);
14
    setIsStreaming(true);
15

16
    // Stream AI response
17
    await streamChat(
18
      { question, history: messages },
19
      (chunk: string) => {
20
        // Incrementally append AI message content (critical)
21
        setMessages(prev => prev.map(msg =>
22
          msg.id === aiMsgId
23
            ? { ...msg, content: msg.content + chunk }
24
            : msg
25
        ));
26
      },
27
      () => setIsStreaming(false)
28
    );
29
  };
30

31
  return (
32
    <div className="chat-container">
33
      {messages.map(msg => (
34
        <div key={msg.id} className={msg.type}>
35
          <Streamdown>{msg.content}</Streamdown>
36
        </div>
37
      ))}
38
    </div>
39
  );
40
}

Key implementation points#

1. Accumulate, don’t replace#

1
// ❌ Wrong: replacing content causes flicker
2
setContent(newContent);
3

4
// ✅ Correct: append chunks incrementally
5
setContent(prev => prev + newChunk);

2. Buffer management#

1
let buffer = '';
2
buffer += chunk;
3
const lines = buffer.split('\n');
4
buffer = lines.pop() || ''; // keep incomplete trailing line

3. First-chunk-first UX#

1
let isFirstChunk = true;
2
onChunk((chunk) => {
3
  if (isFirstChunk) {
4
    setLoading(false); // hide loading immediately on first chunk
5
    isFirstChunk = false;
6
  }
7
  setContent(prev => prev + chunk);
8
});

Best practices#

Error handling: every stream flow should expose onError
Connection management: SSE handles reconnection semantics well
Performance: use useCallback to reduce unnecessary rerenders
UX details:

Show “Generating…” feedback
Hide loading right after first chunk arrives
Provide a cancel option

Full example#

Reference implementation in LearnOS:

Streaming API: src/lib/api/courseService.ts (line 539)
Realtime rendering: src/components/learning/organisms/MainContentReader.tsx (line 111)

Pitfalls I hit#

react-markdown is hard to make smooth in true streaming rendering scenarios.
When backend streams message chunks, make sure you encode correctly, or frontend decoding may produce mismatched content.

Summary#

The core of streaming + real-time Markdown rendering:

Keep pushing data with SSE
Read stream chunks through ReadableStream
Render Markdown incrementally with Streamdown
Use append-based state updates to avoid flicker

With this architecture, you can deliver a smooth “results appear as the model thinks” experience.