Getting Structured Output From LLMs: JSON and Functions
On this page
Large language models are wonderful at producing fluent prose and terrible, by default, at producing data your program can consume. The moment you want to wire an LLM into a pipeline — extracting fields from an invoice, routing a support ticket, populating a database — you stop caring about eloquent sentences and start caring about whether you got back a valid object with the keys you expected. This is the problem of structured output, and over the last few years the tooling around it has matured from "beg the model and parse defensively" into something genuinely reliable.
This post walks through the practical techniques for getting structured data out of LLMs: prompting for JSON, using schema-constrained generation, and leaning on function/tool calling. I'll cover what works, what breaks, and how to build systems that don't fall over the first time a model decides to wrap its answer in a chatty preamble.
Why Free-Form Text Is the Enemy
If you ask a model "What's the sentiment and key topics of this review?" you might get a paragraph, a bulleted list, or a Markdown table — and the format can change between calls, between model versions, and even between identical prompts at higher temperatures. Parsing that with regexes is a losing game.
Structured output flips the contract. Instead of asking for an answer in prose, you ask for an answer that conforms to a shape you define in advance. Your downstream code can then trust the shape and fail loudly when it's violated, rather than silently mishandling an unexpected format.
There are three broad strategies, in increasing order of reliability:
- Prompt-and-parse — ask for JSON in the prompt, then parse the response.
- Native JSON / schema modes — tell the API to constrain output to valid JSON, optionally against a schema.
- Function (tool) calling — define typed functions the model "calls," receiving validated arguments.
Strategy 1: Prompt for JSON
The simplest approach is to ask. A surprisingly effective prompt pattern:
Extract the following fields and respond with ONLY a JSON object,
no markdown fences, no commentary:
- sentiment: one of "positive", "negative", "neutral"
- topics: array of short strings
Review: "The battery life is incredible but the camera disappointed me."
This works most of the time, but "most of the time" is exactly the problem. Common failure modes:
- The model wraps the JSON in
```jsonfences. - It adds a friendly "Here's the JSON you requested:" preamble.
- It emits trailing commas, single quotes, or comments — none of which are valid JSON.
- It hallucinates extra keys or omits required ones.
Defensive parsing helps: strip code fences, locate the first { and last }, and use a tolerant parser. But you're still building a fortress around an unreliable foundation. Use prompt-and-parse only when you can't access better mechanisms, and always validate the result against a schema after parsing.
Strategy 2: Native JSON and Schema Modes
Most modern APIs offer a mode that guarantees syntactically valid JSON. Some go further and let you supply a JSON Schema that the output is constrained to satisfy — meaning the keys, types, and enums are enforced during generation, not checked afterward. This is often called "structured outputs" or "guided/constrained decoding."
The mechanism under the hood is constrained sampling: at each token step, the decoder masks out any token that would violate the grammar implied by your schema. The model literally cannot produce "sentment" if your schema says the key is "sentiment". This eliminates an entire class of bugs.
A schema for our review example might look like:
{
"type": "object",
"properties": {
"sentiment": {
"type": "string",
"enum": ["positive", "negative", "neutral"]
},
"topics": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["sentiment", "topics"],
"additionalProperties": false
}
A few practical notes that bite people:
- Set
additionalProperties: falseif you want to forbid surprise keys. Many schema modes require this for strict enforcement. - Mark fields
requiredgenerously. "Optional" fields often get dropped, and an absent key is harder to handle than a present null. - Keep schemas reasonably flat. Deeply nested, recursive schemas can degrade quality and occasionally hit depth limits. Flatten where you can.
- Schemas constrain shape, not truth. The model will produce a valid
sentiment, but it can still produce the wrong sentiment. Structure is not correctness.
Strategy 3: Function (Tool) Calling
Function calling reframes structured output as "the model wants to invoke a tool, here are the arguments." You define a function with a typed parameter schema; the model responds with a structured call containing arguments that match. Even if you never actually execute a function, this is one of the cleanest ways to extract typed data.
Conceptually you define:
{
"name": "record_review_analysis",
"description": "Store the structured analysis of a customer review.",
"parameters": {
"type": "object",
"properties": {
"sentiment": { "type": "string", "enum": ["positive","negative","neutral"] },
"topics": { "type": "array", "items": { "type": "string" } }
},
"required": ["sentiment", "topics"]
}
}
The model returns a tool call with arguments conforming to parameters. You parse those arguments — already JSON, often already schema-validated — and proceed.
Function calling shines when:
- You have multiple possible actions and want the model to choose one (routing, agents).
- You want strong descriptions to steer behavior — the
descriptionfields act as inline documentation the model reads. - You're building agentic loops where the model calls a tool, sees the result, and continues.
The line between "schema-constrained JSON" and "function calling" is increasingly blurry — both rely on the same constrained-decoding machinery. Choose function calling when the action framing fits your problem, and plain structured output when you just want data back.
Validate Anyway
No matter which strategy you pick, validate on your side. Define your schema once in your application language — using a library like Pydantic, Zod, or a JSON Schema validator — and run every model response through it. Two reasons:
- Defense in depth. Constrained decoding is good, not infallible; edge cases, truncation from hitting token limits, and provider quirks happen.
- Single source of truth. Generate the API-facing schema from your validation model so they can never drift apart.
When validation fails, have a retry strategy: re-prompt the model with the validation error message ("your previous output failed because topics was missing"). Models are remarkably good at self-correcting when handed the specific error.
Practical Tips That Save Headaches
- Lower the temperature. For extraction tasks, temperature 0 (or close) gives more deterministic, parseable output.
- Watch your token limits. A response truncated mid-object is invalid JSON. Budget enough output tokens for the largest plausible result, especially with long arrays.
- Prefer enums over free strings wherever the value set is known. It cuts hallucination dramatically.
- Use descriptions as instructions. Field-level descriptions in your schema are read by the model —
"description": "ISO 8601 date, e.g. 2026-06-26"meaningfully improves formatting. - Test against adversarial inputs. Empty strings, non-English text, and content that has no valid answer all reveal whether your schema and prompt handle the unhappy path.
- Stream carefully. If you stream structured output, you can't parse it until it's complete (or you need an incremental JSON parser). Plan your UX accordingly.
A Decision Heuristic
- Need data back and your provider supports schema-constrained output? Use it. It's the highest reliability for the least code.
- Building an agent or routing between actions? Use function/tool calling.
- Stuck with an older model or an API without structured modes? Prompt-and-parse, then validate hard with retries.
In all three cases, the constant is validation. Treat the model's output as untrusted input crossing a boundary into your system — because that's exactly what it is.
FAQ
Is schema-constrained output slower than plain generation? There's usually a small overhead from constrained decoding and occasionally from schema compilation on the first call, but it's typically negligible compared to the cost of parsing failures and retries you'd otherwise incur. In practice it often speeds up your system by eliminating retry loops.
Does forcing a schema hurt answer quality?
It can, slightly, if the schema is awkward or overly rigid — the model spends "effort" satisfying structure instead of reasoning. Mitigate this by letting the model reason first: include a reasoning or scratchpad string field early in the schema, then the structured fields. The model thinks "out loud" in that field before committing to values.
What about nested or recursive data, like a tree? Most schema modes support nesting and some support recursion, but quality drops as depth grows. If you need deep structures, consider generating them in stages — one level per call — or flattening into an array of nodes with parent IDs.
Why did I get valid JSON with completely wrong values? Because structure and correctness are independent. Constrained decoding guarantees the shape; it does nothing for accuracy. Improve correctness with better prompts, examples (few-shot), lower temperature, and a stronger model — not with a stricter schema.
Should I use JSON mode or function calling if both are available? Pick based on framing. If you simply want a typed result, plain structured/JSON output is more direct. If the model is choosing among actions or you're in an agent loop, function calling is the natural fit. They use the same underlying reliability machinery, so neither is "more correct."
How do I handle the model refusing or having no valid answer?
Build it into the schema. Add a nullable field or an explicit "status": "no_answer" enum value so "I can't answer" is itself a valid, structured response — rather than forcing the model to fabricate data to satisfy required fields.