Advanced Inference Options Part 3

In this lesson we examine a common advanced inference scenario: processing input data around a model. When raw input can’t be consumed directly by a model, you need preprocessing (scaling, encoding, feature engineering) before inference — and often post-processing (formatting, thresholding, filtering) after the model returns predictions. Two core questions arise:

Where should preprocessing and post-processing run?
How can you combine those steps into a single inference request?

Amazon SageMaker addresses this with SageMaker Inference Pipelines — a mechanism for chaining multiple containers (preprocess → model → postprocess) so a single real-time inference request flows through them in sequence. Note that a SageMaker inference pipeline is distinct from a SageMaker Pipeline used for training and CI/CD orchestration; the inference pipeline specifically composes multiple containers for one inference request.

A SageMaker inference pipeline composes a sequence of containers (preprocessing → model → postprocessing) for one real-time request. It is different from a SageMaker Pipeline used to manage training and orchestration workflows.

Conceptual flow

A SageMaker real-time endpoint can host multiple containers on the same instance. You define the execution order so each incoming request is passed through the containers in sequence:

The endpoint receives the inference request and forwards it to the first container (preprocessing).
The preprocessing container transforms the raw input (scaling, encoding, feature generation) and returns transformed data.
The transformed data is passed to the model container, which runs inference and emits predictions.
The model output is forwarded to the postprocessing container, which formats or filters predictions for the client.

This encapsulates the complete inference path — raw input to client-ready output — inside a single real-time request. SageMaker abstracts data transfer between containers, so you can focus on the data-science workflow without managing low-level container plumbing.

SageMaker handles the inter-container data flow. Containers simply accept input and return output; SageMaker wires them together in the order you specify.

Defining an inference pipeline

You define the pipeline in a JSON file that lists the containers in the sequence you want them to run. When you call deploy() on your model (for example with the SageMaker Python SDK), you provide a reference to that pipeline JSON. Deployment and endpoint configuration are otherwise the same — the pipeline JSON tells SageMaker which containers to invoke and in what order. Example pipeline definition (JSON):

{
  "InferencePipelineModelName": "my-inference-pipeline",
  "ModelContainers": [
    {
      "ModelName": "preprocessing-container",
      "Image": "preprocessing-image-uri",
      "Environment": {
        "PREPROCESSING_PARAMS": "value"
      }
    },
    {
      "ModelName": "model-container",
      "Image": "model-image-uri",
      "Environment": {
        "MODEL_PARAMS": "value"
      }
    },
    {
      "ModelName": "postprocessing-container",
      "Image": "postprocessing-image-uri",
      "Environment": {
        "POSTPROCESSING_PARAMS": "value"
      }
    }
  ]
}

Notes:

The array order in “ModelContainers” defines execution sequence; output from container N becomes the input for container N+1.
Each container must implement the SageMaker inference contract (process input, produce output).
You can set container-specific Environment variables to configure behavior (e.g., scaling parameters or model hyperparameters).

Deployment overview

Typical steps to deploy an inference pipeline:

Build container images for preprocessing, model serving, and postprocessing.
Upload images to a container registry (ECR).
Create a pipeline JSON that references each container image and configuration.
Use the SageMaker SDK or API to create the Inference Pipeline Model and deploy a real-time endpoint that uses it.
Send inference requests to the endpoint — each request will traverse preprocess → model → postprocess automatically.

How inference pipelines compare with other SageMaker inference options

Below is a concise comparison to help decide the right option for your workload.

Option	Best for	Key characteristics
SageMaker Batch Transform	Offline, large-scale predictions	No persistent endpoint; spins up compute, processes data, writes outputs, then tears down resources.
Asynchronous Inference	Large payloads or non-immediate results	Returns acknowledgement; results delivered later (SNS, S3). Payloads up to ~1 GB; backend can scale to zero.
Serverless Endpoints	Real-time with unpredictable/bursty traffic	Scales compute on demand; reduces always-on cost; may incur cold-start latency.
SageMaker Feature Store	Low-latency feature retrieval for online inference	Stores pre-computed features (aggregates) usable by both training and online inference.
SageMaker Inference Pipelines	Wrapping preprocessing/postprocessing with model for real-time requests	Chains containers in a single real-time request (preprocess → model → postprocess).

Use inference pipelines when you want to encapsulate preprocessing and postprocessing close to the model, keep the inference request flow simple for clients, and avoid introducing a separate preprocessing service or feature store for simple transformations.

Best practices and considerations

Keep preprocessing/postprocessing lightweight for low-latency requirements; heavy transformations may increase response times.
If preprocessing requires heavy computation or shared historical data, consider using a Feature Store or Batch/Asynchronous pipelines instead.
Monitor and log each container’s behavior to diagnose latency or serialization issues between stages.
Ensure all containers conform to SageMaker’s expected input/output formats so the pipeline passes data correctly.

Links and references

This concludes the lesson. In the next article we’ll walk through creating a hosted endpoint using SageMaker.

A presentation slide titled "Summary" showing a numbered list of five Amazon SageMaker capabilities. The items are SageMaker Batch Transform, Asynchronous Inference, Serverless Endpoints, Feature Store, and Inference Pipelines for pre-/post-processing.

Watch Video

Advanced Inference Options Part 2

Demo Deploy a Hosted Model Using SageMaker Endpoints

​Conceptual flow

​Defining an inference pipeline

​Deployment overview

​How inference pipelines compare with other SageMaker inference options

​Best practices and considerations

​Links and references

Watch Video

Conceptual flow

Defining an inference pipeline

Deployment overview

How inference pipelines compare with other SageMaker inference options

Best practices and considerations

Links and references