The single biggest application category for the Pretext text layout library, since its release, has been AI app frontends. ChatGPT-style streaming text, conversation history with thousands of turns, prompt playgrounds where the user pastes 10,000-token documents — these are the workloads that break naïve text rendering and where Pretext's measurement-during-render model pays for itself within hours of integration.
If you searched for "pretext ai" because you're building an AI chat UI, an LLM playground, or a document-analysis frontend, this page is for you. The patterns below are taken from production AI apps that integrated Pretext to fix specific symptoms.
AI apps share three properties that compound text-layout costs:
Property 1: Text arrives streaming, character by character or token by token. Each new token can change the wrap of the current line, the height of the current message, and (because of virtual scrolling) the position of every message above it.
Property 2: Messages are unbounded in length. A user can paste a novel; the model can return a multi-thousand-token response. A chat list mixes one-word "ok"s with twenty-screen code dumps.
Property 3: Conversation history is long. The full transcript can be hundreds of turns deep. Naïve scrolling — render every message — falls apart at the second hundred. You need virtualization, which needs accurate per-message heights.
These three properties together push you past what DOM-based measurement can reasonably handle. Pretext's two-phase model — prepare() once, layout() cheap — is exactly the shape that maps onto streaming + virtualizing.
The naïve approach: append each token to the current message's text, let the DOM re-layout, repeat. At 50 tokens/sec, that's 50 layouts/sec on a single message. Acceptable for short messages, painful for long ones (the layout cost grows with text length), catastrophic when the message is in a virtualized list (every layout invalidates the list's height calculations).
The Pretext approach: re-prepare on every token, re-layout to compute the new height, update the container's style.height accordingly. The cost is bounded: prepare() for a 1KB string is ~0.1ms, layout() is microseconds.
function StreamingMessage({ text, width, font }: Props) {
// Re-runs every time `text` changes (every new token)
const prepared = useMemo(() => prepare(text, font), [text, font]);
const { height } = useMemo(() => layout(prepared, width, 1.5), [prepared, width]);
return (
<div style={{ width, height, fontFamily: font, lineHeight: 1.5 }}>
{text}
</div>
);
}
You can do better by not re-preparing on every token (prepare() is the most expensive call) and instead deferring re-prepare to a low-frequency tick:
function StreamingMessageDebounced({ text, width, font }: Props) {
const [throttledText, setThrottledText] = useState(text);
useEffect(() => {
const id = requestAnimationFrame(() => setThrottledText(text));
return () => cancelAnimationFrame(id);
}, [text]);
const prepared = useMemo(() => prepare(throttledText, font), [throttledText, font]);
const { height } = useMemo(() => layout(prepared, width, 1.5), [prepared, width]);
return (
<div style={{ width, height, fontFamily: font, lineHeight: 1.5 }}>
{text}
</div>
);
}
The text update is immediate; the height update is throttled to requestAnimationFrame (~16ms cadence). The visual result is identical to a per-token update, the cost is 1/3 to 1/10.
A common chat-UI pattern: stick to the bottom of the scroll while a message streams, but let the user scroll up to read history. The implementation pain: how do you know how much to scroll when the bottom message is growing in height?
Without Pretext: you scroll to a moving target. Either you measure the message after each render (forced layout, jank) or you over-scroll then bounce back (visible jitter).
With Pretext: you know the new height before commit. You can compute the exact scroll delta and apply it in the same frame.
function useStickyBottom(scrollRef: React.RefObject<HTMLDivElement>, height: number) {
const lastHeightRef = useRef(height);
useLayoutEffect(() => {
const el = scrollRef.current;
if (!el) return;
const heightDelta = height - lastHeightRef.current;
lastHeightRef.current = height;
const isAtBottom = el.scrollHeight - el.scrollTop - el.clientHeight < 50;
if (isAtBottom && heightDelta > 0) {
el.scrollTop += heightDelta;
}
}, [height]);
}
height here is the Pretext-computed total height of the current streaming message. The hook keeps you stuck to the bottom without the bouncing animation.
The challenge that AI chat apps run into around their tenth-thousandth message: the entire conversation needs to be virtualized (or rendering becomes a 10-second affair on entry), but the bottom message is streaming, which means its height is changing on every frame. Virtualizers handle the static case well; they handle the changing-bottom-row case poorly.
The Pretext-based architecture:
prepare() and store the result on the message object.estimateSize callback calls layout() for each visible row. Pretext's pure-JS layout is so cheap that this is fine to call per scroll event.const virtualizer = useVirtualizer({
count: messages.length,
getScrollElement: () => parentRef.current,
estimateSize: (i) => layout(messages[i].prepared, width, 1.5).height + padding,
overscan: 4,
});
// When the streaming message updates:
useEffect(() => {
const lastIndex = messages.length - 1;
messages[lastIndex].prepared = prepare(messages[lastIndex].text, font);
virtualizer.measureElement?.(); // trigger re-measurement
}, [streamingText]);
The result: smooth scrolling through 50,000 messages while the bottom one streams in at 200 tokens/sec.
LLM responses frequently contain code blocks. Code blocks need whitespace: pre-wrap semantics — preserve internal whitespace, treat newlines as hard breaks, wrap long lines that exceed the container width.
CSS handles this fine for static rendering. But for accurate height computation (needed for virtual scrolling), you need to measure with pre-wrap semantics. Pretext's prepare() accepts a whiteSpace: 'pre-wrap' option that does exactly this:
const codePrepared = prepare(codeBlock, "14px 'JetBrains Mono'", {
whiteSpace: 'pre-wrap'
});
const { height } = layout(codePrepared, width, 1.5);
The wrapped lines and the height match what the browser will render with white-space: pre-wrap set. Use this for the height of message rows that contain code blocks.
Models output in dozens of languages. A user might prompt in English and get a Japanese response, or vice versa. Each language has different break behavior:
The browser handles all of these. So does Pretext. But your custom code probably doesn't — if you wrote a "split text into lines by walking word boundaries" function, it works for English and breaks for everything else.
Use Pretext for the measurement of every chat message regardless of language. Set the locale via setLocale('ja-JP') etc. and Pretext applies the correct break rules.
A more advanced pattern: when an LLM response starts streaming, you typically have an estimated total token count (from the model's max-tokens hint). You can pre-compute the height the message will have at the estimated final length, and reserve that height in the container, so streaming feels like text appearing in a pre-allocated container rather than the container growing as text appears.
function PreallocatedStreamingMessage({ text, estimatedFinalLength, width, font }: Props) {
// Compute the height it'll be at the estimated final length
const placeholder = ' '.repeat(estimatedFinalLength);
const placeholderPrepared = useMemo(() => prepare(placeholder, font), [placeholder, font]);
const { height: estimatedHeight } = useMemo(
() => layout(placeholderPrepared, width, 1.5),
[placeholderPrepared, width]
);
// Compute the actual height as text streams in
const actualPrepared = useMemo(() => prepare(text, font), [text, font]);
const { height: actualHeight } = useMemo(
() => layout(actualPrepared, width, 1.5),
[actualPrepared, width]
);
return (
<div style={{ width, minHeight: estimatedHeight, height: actualHeight, fontFamily: font }}>
{text}
</div>
);
}
The minHeight reserves space; the height grows into it. Pages below don't shift as the response streams.
LLM output is typically markdown. You parse it, render to HTML or React elements, and the resulting layout is more complex than plain text — headers, lists, blockquotes, code blocks all affect height differently.
The pure-text Pretext approach doesn't handle all of this directly. But for the common case where you're rendering markdown into a known set of components, you can:
This is more work than prepare(rawMarkdown).height, but it's still all microseconds and gives you accurate virtual-scroll heights for messages with mixed content.
Some AI apps return responses that mix text and images. The image dimensions are known (from the response metadata or natural size), and you can use Pretext to lay out the surrounding text correctly:
layoutNextLine per line, with the available width per line determined by whether the image is at this y-position.containerWidth - imageWidth - gap. Below the image, it's the full containerWidth.This is the same code as the Magazine Layout demo — drop-cap-style text wrap, applied to AI output.
A few honest limitations to call out:
For framework-specific integration (the AI patterns above are React-flavored), see the Pretext + React guide. For the underlying API that powers all of this, see the Pretext API reference. For the architectural background that explains why streaming is fast, see How Pretext Works.
If you're benchmarking Pretext against your existing chat-render pipeline, the benchmarks page has reproducible code that compares to DOM measurement directly.
The library lives at github.com/chenglou/pretext and the npm package is at @chenglou/pretext.
pretext.cool is a community-maintained showcase, not affiliated with Cheng Lou or the official Pretext project.