Streaming AI Content and Real-Time Markdown Rendering on the Web
In the AI era, response speed expectations keep rising. Traditional “wait for full response” UX no longer works well. I recently implemented this in a project and hit many pitfalls, so here is a practical guide.
What is streaming?
Streaming means the server continuously pushes data to the client, and the client processes and renders it in real time without waiting for the full payload. Compared with classic request-response, streaming lets users see generation immediately.
Core tech stack
1. Server-Sent Events (SSE)
A lightweight protocol for one-way server-to-client push:
const response = await fetch('/api/chat-stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ question: 'hello' })});2. ReadableStream API
Native browser API for streaming reads:
const reader = response.body?.getReader();const decoder = new TextDecoder();
while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value, { stream: true }); processChunk(chunk);}Implementation steps
Step 1: Backend streaming API
Backend returns SSE format (data: per line, ended by \n\n):
// Node.js backend exampleapp.post('/api/generate-content-stream', async (req, res) => { res.writeHead(200, { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive' });
// Simulate streamed token generation const content = await generateContent(req.body);
for (const token of content) { const data = JSON.stringify({ content: token }); res.write(`data: ${data}\n\n`); }
res.write('data: [DONE]\n\n'); res.end();});Step 2: Frontend stream consumption
async function streamGenerateContent( params: GenerateParams, onChunk: (content: string) => void) { const response = await fetch('/api/generate-content-stream', { method: 'POST', body: JSON.stringify(params) });
const reader = response.body?.getReader(); const decoder = new TextDecoder(); let buffer = '';
while (true) { const { done, value } = await reader!.read(); if (done) break;
buffer += decoder.decode(value, { stream: true }); const lines = buffer.split('\n'); buffer = lines.pop() || '';
for (const line of lines) { if (line.startsWith('data: ')) { const data = line.slice(6); if (data && data !== '[DONE]') { try { const parsed = JSON.parse(data); if (parsed.content) { // Accumulate content (append, do not replace) onChunk(parsed.content); } } catch (e) { console.error('JSON parse failed:', e); } } } } }}Step 3: Real-time Markdown rendering
Use Streamdown for incremental Markdown rendering:
npm install streamdownimport { Streamdown } from 'streamdown';
function ContentReader({ content, isLoading }) { return ( <div className="markdown-content"> {isLoading ? ( <div>Generating...</div> ) : ( <Streamdown>{content}</Streamdown> )} </div> );}Why Streamdown?
- ✅ Handles incomplete Markdown fragments
- ✅ Supports incremental rendering
- ✅ Gracefully handles unclosed syntax
Step 4: State management and incremental updates
function ChatPanel() { const [messages, setMessages] = useState<Message[]>([]); const [isStreaming, setIsStreaming] = useState(false);
const sendMessage = async (question: string) => { // Create user message const userMsg: Message = { id: Date.now(), type: 'user', content: question }; setMessages(prev => [...prev, userMsg]);
// Create AI placeholder message const aiMsgId = `ai-${Date.now()}`; const aiMsg: Message = { id: aiMsgId, type: 'ai', content: '' }; setMessages(prev => [...prev, aiMsg]); setIsStreaming(true);
// Stream AI response await streamChat( { question, history: messages }, (chunk: string) => { // Incrementally append AI message content (critical) setMessages(prev => prev.map(msg => msg.id === aiMsgId ? { ...msg, content: msg.content + chunk } : msg )); }, () => setIsStreaming(false) ); };
return ( <div className="chat-container"> {messages.map(msg => ( <div key={msg.id} className={msg.type}> <Streamdown>{msg.content}</Streamdown> </div> ))} </div> );}Key implementation points
1. Accumulate, don’t replace
// ❌ Wrong: replacing content causes flickersetContent(newContent);
// ✅ Correct: append chunks incrementallysetContent(prev => prev + newChunk);2. Buffer management
let buffer = '';buffer += chunk;const lines = buffer.split('\n');buffer = lines.pop() || ''; // keep incomplete trailing line3. First-chunk-first UX
let isFirstChunk = true;onChunk((chunk) => { if (isFirstChunk) { setLoading(false); // hide loading immediately on first chunk isFirstChunk = false; } setContent(prev => prev + chunk);});Best practices
- Error handling: every stream flow should expose
onError - Connection management: SSE handles reconnection semantics well
- Performance: use
useCallbackto reduce unnecessary rerenders - UX details:
- Show “Generating…” feedback
- Hide loading right after first chunk arrives
- Provide a cancel option
Full example
Reference implementation in LearnOS:
- Streaming API:
src/lib/api/courseService.ts(line 539) - Realtime rendering:
src/components/learning/organisms/MainContentReader.tsx(line 111)
Pitfalls I hit
react-markdownis hard to make smooth in true streaming rendering scenarios.- When backend streams message chunks, make sure you encode correctly, or frontend decoding may produce mismatched content.
Summary
The core of streaming + real-time Markdown rendering:
- Keep pushing data with SSE
- Read stream chunks through ReadableStream
- Render Markdown incrementally with Streamdown
- Use append-based state updates to avoid flicker
With this architecture, you can deliver a smooth “results appear as the model thinks” experience.