- Where should preprocessing and post-processing run?
- How can you combine those steps into a single inference request?
A SageMaker inference pipeline composes a sequence of containers (preprocessing → model → postprocessing) for one real-time request. It is different from a SageMaker Pipeline used to manage training and orchestration workflows.
Conceptual flow
A SageMaker real-time endpoint can host multiple containers on the same instance. You define the execution order so each incoming request is passed through the containers in sequence:- The endpoint receives the inference request and forwards it to the first container (preprocessing).
- The preprocessing container transforms the raw input (scaling, encoding, feature generation) and returns transformed data.
- The transformed data is passed to the model container, which runs inference and emits predictions.
- The model output is forwarded to the postprocessing container, which formats or filters predictions for the client.
SageMaker handles the inter-container data flow. Containers simply accept input and return output; SageMaker wires them together in the order you specify.
Defining an inference pipeline
You define the pipeline in a JSON file that lists the containers in the sequence you want them to run. When you call deploy() on your model (for example with the SageMaker Python SDK), you provide a reference to that pipeline JSON. Deployment and endpoint configuration are otherwise the same — the pipeline JSON tells SageMaker which containers to invoke and in what order. Example pipeline definition (JSON):- The array order in “ModelContainers” defines execution sequence; output from container N becomes the input for container N+1.
- Each container must implement the SageMaker inference contract (process input, produce output).
- You can set container-specific Environment variables to configure behavior (e.g., scaling parameters or model hyperparameters).
Deployment overview
Typical steps to deploy an inference pipeline:- Build container images for preprocessing, model serving, and postprocessing.
- Upload images to a container registry (ECR).
- Create a pipeline JSON that references each container image and configuration.
- Use the SageMaker SDK or API to create the Inference Pipeline Model and deploy a real-time endpoint that uses it.
- Send inference requests to the endpoint — each request will traverse preprocess → model → postprocess automatically.
How inference pipelines compare with other SageMaker inference options
Below is a concise comparison to help decide the right option for your workload.| Option | Best for | Key characteristics |
|---|---|---|
| SageMaker Batch Transform | Offline, large-scale predictions | No persistent endpoint; spins up compute, processes data, writes outputs, then tears down resources. |
| Asynchronous Inference | Large payloads or non-immediate results | Returns acknowledgement; results delivered later (SNS, S3). Payloads up to ~1 GB; backend can scale to zero. |
| Serverless Endpoints | Real-time with unpredictable/bursty traffic | Scales compute on demand; reduces always-on cost; may incur cold-start latency. |
| SageMaker Feature Store | Low-latency feature retrieval for online inference | Stores pre-computed features (aggregates) usable by both training and online inference. |
| SageMaker Inference Pipelines | Wrapping preprocessing/postprocessing with model for real-time requests | Chains containers in a single real-time request (preprocess → model → postprocess). |
Best practices and considerations
- Keep preprocessing/postprocessing lightweight for low-latency requirements; heavy transformations may increase response times.
- If preprocessing requires heavy computation or shared historical data, consider using a Feature Store or Batch/Asynchronous pipelines instead.
- Monitor and log each container’s behavior to diagnose latency or serialization issues between stages.
- Ensure all containers conform to SageMaker’s expected input/output formats so the pipeline passes data correctly.
Links and references
- AWS SageMaker Documentation
- SageMaker Feature Store
- SageMaker Asynchronous Inference
- SageMaker Batch Transform
- KodeKloud SageMaker course
