This article explains batching in LangChain, allowing multiple prompts to be sent simultaneously for faster LLM responses.
This article builds on our previous examples of synchronous chain invocation and streaming output. Here, we’ll dive into batching—sending multiple prompts at once so LangChain can parallelize LLM calls and return results much faster, often taking about the same time as a single prompt.
Batching allows you to submit a list of prompt dictionaries in one call. Instead of invoking your chain for each prompt sequentially, you pass them all to chain.batch(...) and let LangChain handle parallel execution under the hood.
Batching is ideal when you have multiple similar prompts (e.g., summarization, question-answering) and want to reduce overall latency. Ensure your LLM provider supports concurrent requests.
Below is a simple example of setting up a prompt chain and running batched inference:
Copy
Ask AI
from langchain import LLMChain, PromptTemplatefrom langchain.llms import OpenAI# Define your prompt template, LLM, and output parser if neededprompt = PromptTemplate( input_variables=["question"], template="Answer the following question concisely: {question}")llm = OpenAI(model_name="gpt-4")chain = LLMChain(prompt=prompt, llm=llm)# Prepare a list of prompt dictionariesquestions = [ {"question": "Tell me about The Godfather movie."}, {"question": "Tell me about the Avatar movie."},]# Batch inferenceresponses = chain.batch(questions)
Each entry in responses corresponds to one prompt:
Copy
Ask AI
# Print the first responseprint(responses[0])
Output:
Copy
Ask AI
The Godfather is a classic American crime film directed by Francis Ford Coppola, released in 1972. It follows the Corleone family and their patriarch Vito Corleone (Marlon Brando) passing control of the empire to Michael (Al Pacino). Renowned for its storytelling, performances, and cinematography, it's often cited as one of the greatest films ever made.