Pretext for AI Apps

Pretext for AI: Token Streaming, Chat UIs, and LLM Output Layout

The single biggest application category for the Pretext text layout library, since its release, has been AI app frontends. ChatGPT-style streaming text, conversation history with thousands of turns, prompt playgrounds where the user pastes 10,000-token documents — these are the workloads that break naïve text rendering and where Pretext's measurement-during-render model pays for itself within hours of integration.

If you searched for "pretext ai" because you're building an AI chat UI, an LLM playground, or a document-analysis frontend, this page is for you. The patterns below are taken from production AI apps that integrated Pretext to fix specific symptoms.

Why AI Apps Are Different

AI apps share three properties that compound text-layout costs:

Property 1: Text arrives streaming, character by character or token by token. Each new token can change the wrap of the current line, the height of the current message, and (because of virtual scrolling) the position of every message above it.

Property 2: Messages are unbounded in length. A user can paste a novel; the model can return a multi-thousand-token response. A chat list mixes one-word "ok"s with twenty-screen code dumps.

Property 3: Conversation history is long. The full transcript can be hundreds of turns deep. Naïve scrolling — render every message — falls apart at the second hundred. You need virtualization, which needs accurate per-message heights.

These three properties together push you past what DOM-based measurement can reasonably handle. Pretext's two-phase model — prepare() once, layout() cheap — is exactly the shape that maps onto streaming + virtualizing.

Streaming Token Rendering Without Layout Shift

The naïve approach: append each token to the current message's text, let the DOM re-layout, repeat. At 50 tokens/sec, that's 50 layouts/sec on a single message. Acceptable for short messages, painful for long ones (the layout cost grows with text length), catastrophic when the message is in a virtualized list (every layout invalidates the list's height calculations).

The Pretext approach: re-prepare on every token, re-layout to compute the new height, update the container's style.height accordingly. The cost is bounded: prepare() for a 1KB string is ~0.1ms, layout() is microseconds.

function StreamingMessage({ text, width, font }: Props) {
  // Re-runs every time `text` changes (every new token)
  const prepared = useMemo(() => prepare(text, font), [text, font]);
  const { height } = useMemo(() => layout(prepared, width, 1.5), [prepared, width]);

  return (
    <div style={{ width, height, fontFamily: font, lineHeight: 1.5 }}>
      {text}
    </div>
  );
}

You can do better by not re-preparing on every token (prepare() is the most expensive call) and instead deferring re-prepare to a low-frequency tick:

function StreamingMessageDebounced({ text, width, font }: Props) {
  const [throttledText, setThrottledText] = useState(text);

  useEffect(() => {
    const id = requestAnimationFrame(() => setThrottledText(text));
    return () => cancelAnimationFrame(id);
  }, [text]);

  const prepared = useMemo(() => prepare(throttledText, font), [throttledText, font]);
  const { height } = useMemo(() => layout(prepared, width, 1.5), [prepared, width]);

  return (
    <div style={{ width, height, fontFamily: font, lineHeight: 1.5 }}>
      {text}
    </div>
  );
}

The text update is immediate; the height update is throttled to requestAnimationFrame (~16ms cadence). The visual result is identical to a per-token update, the cost is 1/3 to 1/10.

Auto-Scroll-to-Bottom That Doesn't Fight Streaming

A common chat-UI pattern: stick to the bottom of the scroll while a message streams, but let the user scroll up to read history. The implementation pain: how do you know how much to scroll when the bottom message is growing in height?

Without Pretext: you scroll to a moving target. Either you measure the message after each render (forced layout, jank) or you over-scroll then bounce back (visible jitter).

With Pretext: you know the new height before commit. You can compute the exact scroll delta and apply it in the same frame.

function useStickyBottom(scrollRef: React.RefObject<HTMLDivElement>, height: number) {
  const lastHeightRef = useRef(height);

  useLayoutEffect(() => {
    const el = scrollRef.current;
    if (!el) return;
    const heightDelta = height - lastHeightRef.current;
    lastHeightRef.current = height;

    const isAtBottom = el.scrollHeight - el.scrollTop - el.clientHeight < 50;
    if (isAtBottom && heightDelta > 0) {
      el.scrollTop += heightDelta;
    }
  }, [height]);
}

height here is the Pretext-computed total height of the current streaming message. The hook keeps you stuck to the bottom without the bouncing animation.

Virtualized Chat With Streaming

The challenge that AI chat apps run into around their tenth-thousandth message: the entire conversation needs to be virtualized (or rendering becomes a 10-second affair on entry), but the bottom message is streaming, which means its height is changing on every frame. Virtualizers handle the static case well; they handle the changing-bottom-row case poorly.

The Pretext-based architecture:

  1. Prepare every message at insert time: when a message arrives in the conversation, immediately call prepare() and store the result on the message object.
  2. Layout on demand for the virtualizer: the virtualizer's estimateSize callback calls layout() for each visible row. Pretext's pure-JS layout is so cheap that this is fine to call per scroll event.
  3. For the streaming message specifically: re-prepare on every token (or throttled to RAF), re-layout, and tell the virtualizer that the row's size has changed.
const virtualizer = useVirtualizer({
  count: messages.length,
  getScrollElement: () => parentRef.current,
  estimateSize: (i) => layout(messages[i].prepared, width, 1.5).height + padding,
  overscan: 4,
});

// When the streaming message updates:
useEffect(() => {
  const lastIndex = messages.length - 1;
  messages[lastIndex].prepared = prepare(messages[lastIndex].text, font);
  virtualizer.measureElement?.(); // trigger re-measurement
}, [streamingText]);

The result: smooth scrolling through 50,000 messages while the bottom one streams in at 200 tokens/sec.

Code Block Reflow With Pre-Wrap

LLM responses frequently contain code blocks. Code blocks need whitespace: pre-wrap semantics — preserve internal whitespace, treat newlines as hard breaks, wrap long lines that exceed the container width.

CSS handles this fine for static rendering. But for accurate height computation (needed for virtual scrolling), you need to measure with pre-wrap semantics. Pretext's prepare() accepts a whiteSpace: 'pre-wrap' option that does exactly this:

const codePrepared = prepare(codeBlock, "14px 'JetBrains Mono'", {
  whiteSpace: 'pre-wrap'
});
const { height } = layout(codePrepared, width, 1.5);

The wrapped lines and the height match what the browser will render with white-space: pre-wrap set. Use this for the height of message rows that contain code blocks.

CJK-Correct Output Measurement

Models output in dozens of languages. A user might prompt in English and get a Japanese response, or vice versa. Each language has different break behavior:

The browser handles all of these. So does Pretext. But your custom code probably doesn't — if you wrote a "split text into lines by walking word boundaries" function, it works for English and breaks for everything else.

Use Pretext for the measurement of every chat message regardless of language. Set the locale via setLocale('ja-JP') etc. and Pretext applies the correct break rules.

Token-Aware Pre-Allocation

A more advanced pattern: when an LLM response starts streaming, you typically have an estimated total token count (from the model's max-tokens hint). You can pre-compute the height the message will have at the estimated final length, and reserve that height in the container, so streaming feels like text appearing in a pre-allocated container rather than the container growing as text appears.

function PreallocatedStreamingMessage({ text, estimatedFinalLength, width, font }: Props) {
  // Compute the height it'll be at the estimated final length
  const placeholder = ' '.repeat(estimatedFinalLength);
  const placeholderPrepared = useMemo(() => prepare(placeholder, font), [placeholder, font]);
  const { height: estimatedHeight } = useMemo(
    () => layout(placeholderPrepared, width, 1.5),
    [placeholderPrepared, width]
  );

  // Compute the actual height as text streams in
  const actualPrepared = useMemo(() => prepare(text, font), [text, font]);
  const { height: actualHeight } = useMemo(
    () => layout(actualPrepared, width, 1.5),
    [actualPrepared, width]
  );

  return (
    <div style={{ width, minHeight: estimatedHeight, height: actualHeight, fontFamily: font }}>
      {text}
    </div>
  );
}

The minHeight reserves space; the height grows into it. Pages below don't shift as the response streams.

Markdown Rendering With Accurate Heights

LLM output is typically markdown. You parse it, render to HTML or React elements, and the resulting layout is more complex than plain text — headers, lists, blockquotes, code blocks all affect height differently.

The pure-text Pretext approach doesn't handle all of this directly. But for the common case where you're rendering markdown into a known set of components, you can:

  1. Compute Pretext heights for each text node (paragraphs, list items, code blocks).
  2. Add the fixed heights of structural elements (margin, padding, header sizing).
  3. Sum to get the total message height.

This is more work than prepare(rawMarkdown).height, but it's still all microseconds and gives you accurate virtual-scroll heights for messages with mixed content.

Multimodal: Images Inline With Text

Some AI apps return responses that mix text and images. The image dimensions are known (from the response metadata or natural size), and you can use Pretext to lay out the surrounding text correctly:

This is the same code as the Magazine Layout demo — drop-cap-style text wrap, applied to AI output.

Where AI Apps Tend to Hit the Limits

A few honest limitations to call out:

What to Read Next

For framework-specific integration (the AI patterns above are React-flavored), see the Pretext + React guide. For the underlying API that powers all of this, see the Pretext API reference. For the architectural background that explains why streaming is fast, see How Pretext Works.

If you're benchmarking Pretext against your existing chat-render pipeline, the benchmarks page has reproducible code that compares to DOM measurement directly.

The library lives at github.com/chenglou/pretext and the npm package is at @chenglou/pretext.


pretext.cool is a community-maintained showcase, not affiliated with Cheng Lou or the official Pretext project.

Related Pages

Try a Live Demo

← Browse all 20 demos