Demo Deploy a Hosted Model Using SageMaker Endpoints

In this lesson we’ll deploy a trained model to Amazon SageMaker by creating SageMaker model packages, endpoint configurations, and endpoints. The walkthrough shows how to perform the steps from SageMaker Studio (Jupyter notebook) and how to programmatically create and test endpoints using the Boto3 SDK and the SageMaker Python SDK. Objectives

Build a SageMaker model package that associates a model artifact (model.tar.gz) with a container image.
Create an endpoint configuration and endpoint using the Boto3 SDK.
Create and test an endpoint using the SageMaker SDK and compare using the Predictor / runtime invocation.

A presentation slide titled "Demo Steps" listing three numbered steps: confirm the model package is available, run a Jupyter notebook with boto3 to create and test SageMaker endpoint config/endpoints, and run a notebook with the SageMaker SDK to create and test an endpoint using the Predictor class. The slide has a dark background with teal accent bars and a copyright notice for KodeKloud.

We run the demonstration from a Jupyter notebook inside SageMaker Studio. Open your Studio environment, create or open a notebook, and run the cells in order.

A screenshot of a JupyterLab/SageMaker workspace launcher showing notebook, console, and other file types (Python, Glue, Spark) as clickable tiles. The left pane shows a file browser with folders and notebooks for a "sagemaker-demystified" project.

Step-by-step code (consolidated and corrected)

The following code is intended to run in Jupyter Notebook cells in SageMaker Studio.
It covers: imports, uploading a local model artifact to S3, fetching the correct container image URI, registering the model (model package), creating an endpoint configuration with Boto3, creating the endpoint, waiting for it to become InService, and invoking the endpoint.

# python
# Step 0: Imports and session setup
import time
import json
import boto3
import sagemaker
from sagemaker import get_execution_role, Model
from sagemaker.session import Session

sagemaker_session = Session()
role = get_execution_role()

# Define S3 bucket and prefix (use no leading slash to avoid double-slashes)
s3_bucket = sagemaker_session.default_bucket()
s3_prefix = "model"

print("Using role:", role)
print("Using bucket:", s3_bucket, "prefix:", s3_prefix)

Upload the local model artifact to S3 and look up the appropriate container image for the framework (Linear Learner in this example). We use sagemaker.image_uris.retrieve to get the correct ECR image URI for the current region.

# python
# Step 1: Upload model artifact and get container image URI
local_file_path = "model.tar.gz"  # local file produced by training/export
s3_uri = sagemaker_session.upload_data(local_file_path, bucket=s3_bucket, key_prefix=s3_prefix)
print(f"File uploaded to: {s3_uri}")

model_artifact = f"s3://{s3_bucket}/{s3_prefix}/model.tar.gz"
region = sagemaker_session.boto_region_name

# Retrieve the Linear Learner image URI for the region
container_image_uri = sagemaker.image_uris.retrieve(framework="linear-learner", region=region)

print(f"SageMaker Linear Learner Image URI: {container_image_uri}")

# Name for the model package
model_name = "house-prices-inference-demo"

Create and register the SageMaker model (model package) using the SageMaker SDK Model class. Calling model.create() registers this model in SageMaker (visible under Inference -> Models).

# python
# Step 2: Create SageMaker Model (model package)
model = Model(
    image_uri=container_image_uri,
    model_data=model_artifact,
    role=role,
    name=model_name
)

# Register the model in SageMaker (create_model)
model.create()
print(f"Model registered with name: {model_name}")

You can verify the registered model in the SageMaker console. The next image shows the model details page where the container image and S3 model data location are associated with the model.

A screenshot of the Amazon SageMaker console showing the "house-prices-inference-demo" model details, including ARN, container image and model data location. The left navigation pane and top toolbar with SageMaker services (JumpStart, Inference, Training, etc.) are also visible.

Create an endpoint configuration using Boto3. Endpoint configurations describe one or more production variants (models + instance types + instance counts + weights). Here we create a single variant that receives all traffic.

# python
# Step 3: Create Endpoint Configuration using Boto3
sagemaker_client = boto3.client("sagemaker")

endpoint_config_name = "my-endpoint-config-using-boto3"

create_endpoint_config_response = sagemaker_client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            "InstanceType": "ml.t2.medium",
            "InitialVariantWeight": 1.0,
            "InitialInstanceCount": 1,
            "ModelName": model_name,
            "VariantName": "AllTraffic"
        }
    ]
)

print("Endpoint configuration created:", create_endpoint_config_response["EndpointConfigArn"])

Create the endpoint using the endpoint configuration. Endpoint creation is asynchronous — provisioning and starting the model container typically takes several minutes.

# python
# Step 4: Create the endpoint
endpoint_name = "my-endpoint-using-boto3"

create_endpoint_response = sagemaker_client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name
)

print(f"Creating endpoint: {endpoint_name}")
print("Create endpoint response ARN:", create_endpoint_response.get("EndpointArn"))

You can inspect endpoint creation progress in the SageMaker console (Endpoints) or poll programmatically. The following screenshot shows an endpoint in the Creating state.

Screenshot of the Amazon SageMaker console showing "Endpoint configuration settings" for "my-endpoint-config-using-boto3." The page lists data capture options and a Production variant with the model "house-prices-inference-demo."

Below is a helper to poll the endpoint until it’s InService. Waiting prevents invocation errors while the endpoint is still provisioning.

# python
# Step 5: Wait until the endpoint is InService (simple polling)
def wait_for_endpoint_in_service(sagemaker_client, endpoint_name, timeout_minutes=15):
    timeout = time.time() + timeout_minutes * 60
    while time.time() < timeout:
        resp = sagemaker_client.describe_endpoint(EndpointName=endpoint_name)
        status = resp["EndpointStatus"]
        print(f"Endpoint status: {status}")
        if status == "InService":
            print("Endpoint is InService.")
            return True
        if status in ("Failed", "OutOfService"):
            raise RuntimeError(f"Endpoint entered terminal state: {status}")
        time.sleep(30)
    raise TimeoutError(f"Timed out waiting for endpoint {endpoint_name} to become InService")

# Call the waiter (this may take several minutes)
wait_for_endpoint_in_service(sagemaker_client, endpoint_name)

You can also view the endpoint from SageMaker Studio’s Endpoints list as it transitions to InService.

Screenshot of the AWS SageMaker Studio "Endpoints" console in dark mode, showing one endpoint named "my-endpoint-using-boto3" with status "Creating." The left navigation pane displays SageMaker apps and deployment options, and there are Create endpoint and Delete buttons at the top right.

When the endpoint is InService, invoke it using the SageMaker runtime (boto3 client sagemaker-runtime). Note: Boto3’s SageMaker client handles creation/describing of endpoints and endpoint configurations; sagemaker-runtime is used for invoking inferences.

# python
# Step 6: Invoke the endpoint (once InService)
sagemaker_runtime_client = boto3.client("sagemaker-runtime")

# Example CSV input (features should match the model's expected format)
input_data = "51.6215527,-0.2466031,2.0,6.0,517.0,3.0,289000.0,27.393364928909996,1055000.0,37.01298701298701,285000.0,1354000.0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0"

response = sagemaker_runtime_client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType="text/csv",
    Body=input_data
)

# Decode and print response body
prediction = response["Body"].read().decode("utf-8")
print(f"Predicted house price: {prediction}")

Notes and recommendations

If you want granular control over the endpoint configuration and lifecycle, use Boto3 (as shown). The SageMaker Python SDK provides higher-level abstractions for model + endpoint creation and also the Predictor class for inference, which can make development faster and code more concise. Choose the approach that best fits your automation, control, and CI/CD needs.

Creating endpoints incurs AWS costs while instances are running. Use minimal instance sizes for testing, delete endpoints when not in use, and ensure IAM roles used by Studio have just the permissions needed for deployment.

Quick reference table

Resource / API	Purpose	Example
SageMaker Model (SDK: Model)	Register model artifact + container image	`model.create()`
Boto3 sagemaker (create_endpoint_config)	Create endpoint configuration (variants, instance types)	`sagemaker_client.create_endpoint_config(...)`
Boto3 sagemaker (create_endpoint)	Create endpoint (asynchronous)	`sagemaker_client.create_endpoint(...)`
Boto3 sagemaker (describe_endpoint)	Check endpoint status	`sagemaker_client.describe_endpoint(...)`
Boto3 sagemaker-runtime	Invoke endpoint for inference	`sagemaker-runtime.invoke_endpoint(...)`
SageMaker Python SDK (Predictor)	High-level inference client (simplifies invocation/serialization)	`predictor.predict(...)`

Summary

Uploaded a local model artifact to S3, retrieved the appropriate container image URI, and registered a SageMaker model package.
Created an endpoint configuration and endpoint using Boto3, polled until the endpoint became InService, and invoked the endpoint using the SageMaker runtime with CSV input.
The SageMaker Python SDK provides higher-level operations and a Predictor class for simplified inference flows; comparing both patterns helps determine which integrates best with your automation strategy.

Watch Video