LangChain

Interacting with LLMs

Parsing Model Output

In this lesson, we focus on transforming a language model’s plain-text responses into structured formats like JSON, XML, YAML, or CSV. Although Large Language Models (LLMs) always return text, downstream applications typically require data in a predictable schema. By embedding clear formatting instructions in your prompt and using LangChain’s OutputParser, you can automate this workflow end-to-end.

The image is a diagram illustrating the flow of information between a user, input, model I/O, output, and a language model, with a focus on "Always Text."

Why Structured Output Matters

  • Interoperability: Structured data (JSON, XML, YAML) integrates seamlessly with APIs and databases.
  • Reliability: Reduces parsing errors and unexpected values at runtime.
  • Maintainability: Clear schemas make it easier to validate and extend your data model.

Note

Large language models always return text. To work with objects, you need to parse and validate that text.

How LangChain’s OutputParser Works

LangChain’s OutputParser automates both prompt construction and response transformation:

  1. Prompt Construction
    You define a schema and example responses inside a PromptTemplate. The model then knows exactly which structure (e.g., JSON with specific fields) to produce.

  2. Response Transformation
    After receiving the text output, the parser converts it into your target data type (e.g., Python dict, XML DOM, YAML mapping), handling parsing errors and edge cases.

from langchain import PromptTemplate, LLMChain
from langchain.output_parsers import ResponseSchema, StructuredOutputParser

# 1. Define the schema you expect
schemas = [
    ResponseSchema(name="title", description="Title of the article"),
    ResponseSchema(name="tags", description="List of relevant tags"),
]

# 2. Create an output parser
parser = StructuredOutputParser.from_response_schemas(schemas)

# 3. Build a prompt that includes instructions and examples
template = """
Generate an article summary:

{format_instructions}

Article:
\"\"\"
{article_text}
\"\"\"
"""
prompt = PromptTemplate(template=template, input_variables=["article_text"], partial_variables=parser.get_format_instructions())

# 4. Run the chain and parse
chain = LLMChain(llm=llm, prompt=prompt)
output = chain.run(article_text="...your content here...")
result = parser.parse(output)

The image illustrates a comparison between an "Internal Python Data Structure" represented by the Python logo and "More Structured Markup" represented by XML and YAML icons.

Common Output Formats

FormatDescriptionExample
JSONWidely used, machine-readable{ "name": "Alice", "age": 30 }
XMLMarkup-based, verbose<person><name>Alice</name></person>
YAMLHuman-friendly, indentation-basedname: Alice
CSVTabular data, comma-separatedname,age\nAlice,30

Benefits of Using OutputParser

  • Predictability: Enforces schema so you avoid malformed data.
  • Error Handling: Catches parsing exceptions early and returns structured error messages.
  • Extensibility: Easily swap or update schemas without rewriting parsing logic.

Warning

Always include format_instructions from StructuredOutputParser in your prompt. Omitting them can lead to inconsistent model responses and parsing failures.

Next Steps

  1. Experiment with custom ResponseSchema definitions for your use case.
  2. Validate parsed output against JSON Schema or your own validators.
  3. Integrate the parser into your production pipeline for reliable data ingestion.

Watch Video

Watch video content

Previous
Few shot Prompt Templates