This guide evaluates trained models for deployment, covering data preprocessing, inference techniques, and performance metrics computation.
In this guide, we evaluate multiple trained models to identify the best performer for deployment. We cover every step—from loading and preprocessing unseen data to executing real-time and batch inferences, and finally computing performance metrics.
Begin by importing the necessary modules and defining an image preprocessing transformation. These transformations ensure the image data is in the required input format during inference.
Create a custom dataset class to load images along with their corresponding labels. In this example, the label encoding maps “malignant” to 0 and “benign” to 1.
After defining the dataset, instantiate it and create a DataLoader. The DataLoader processes the test data in batches (batch size of 32) without shuffling, which is standard for evaluation purposes.
Copy
Ask AI
# Instantiate the test dataset and create a DataLoadertest_dataset = CustomImageDataset( annotations_file='test_data.csv', img_dir='../../../', transform=test_transform, target_transform=label_encoding)test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
Ensure that your CSV file and image directory path are correctly set to avoid file not found errors.
Load your pre-trained model by defining its network architecture and then loading the checkpoint containing the trained weights. Below is a sample implementation of a breast cancer classification model.
Copy
Ask AI
import torch.nn as nnimport torch.nn.functional as Fclass BreastCancerClassification(nn.Module): def __init__(self): super(BreastCancerClassification, self).__init__() # Define convolutional layers self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1) self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1) self.pool = nn.MaxPool2d(2) # Define fully connected layers self.fc1 = nn.Linear(64 * 32 * 32, 256) self.fc2 = nn.Linear(256, 128) self.fc3 = nn.Linear(128, 2) def forward(self, x): x = self.pool(F.relu(self.conv1(x))) x = self.pool(F.relu(self.conv2(x))) x = x.view(-1, 64 * 32 * 32) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x# Create an instance of the modelmodel = BreastCancerClassification()# Load model checkpoint (weights only)checkpoint = torch.load('model_checkpoint.tar', map_location=torch.device('cpu'))model.load_state_dict(checkpoint['model_state_dict'])
After verifying that the model keys match, the model is ready for evaluation.
Real-time inference processes a single prediction request instantly. The example below shows how to load an image, apply the transformation, and then perform inference to obtain a class prediction.
Copy
Ask AI
# Import required module for image handlingfrom PIL import Image# Open a sample imageimage_path = 'sample-input.jpg'image = Image.open(image_path)# Set the model to evaluation modemodel.eval()# Apply the test transformationtransformed_image = test_transform(image)# Perform inferenceoutput = model(transformed_image)_, predicted = torch.max(output.data, 1) # Identify the highest scoring class# Print the predicted classprint(f"Class: {predicted.item()}")# Map the predicted index to a human-readable labelindex_to_class_map = {v: k for k, v in label_encoding.items()}print(f"Predicted label: {index_to_class_map[predicted.item()]}")
Batch inference processes multiple images simultaneously. This approach mirrors the method used during training. The following code demonstrates batch processing using the DataLoader.
Copy
Ask AI
# Batch Inference Exampleprint(f"Batch size: {test_loader.batch_size}")print(f"Total images in dataset: {len(test_dataset)}")# Process and predict in batchesfor i, data in enumerate(test_loader, 0): inputs, labels = data outputs = model(inputs) _, predicted = torch.max(outputs.data, 1) print(f"Batch {i + 1} prediction: {predicted}")
Since the dataset contains 100 images and the batch size is 32, the DataLoader returns three full batches and a final batch of 4 images.It is good practice to perform inference without tracking gradients to speed up processing and lower memory consumption:
Copy
Ask AI
# Batch Inference with gradient tracking disabledwith torch.no_grad(): for i, data in enumerate(test_loader, 0): inputs, labels = data outputs = model(inputs) _, predicted = torch.max(outputs.data, 1) print(f"Batch {i + 1} prediction: {predicted}")
You can also time these loops using magic commands like %%time in a Jupyter Notebook to compare performance.
TorchMetrics is an excellent library for tracking various performance metrics during model evaluation. In the example below, we compute the accuracy based on simulated predictions for a three-class classification setup.
Copy
Ask AI
import torchmetrics# Simulated network outputs for a 3-class classification problemnum_classes = 3inputs_simulated = torch.tensor([ [2.0, 1.0, 0.1], [0.5, 2.5, 0.1], [0.1, 0.4, 3.0], [0.3, 1.5, 2.1], [0.0, 2.0, 3.0], [2.0, 1.5, 0.6], [1.0, 0.5, 2.0]])# Define true labels (with some intentional discrepancies for demonstration)true_labels = torch.tensor([1, 1, 2, 0, 2, 1, 1, 0, 1, 0])# Initialize the accuracy metric for a multi-class taskaccuracy_metric = torchmetrics.Accuracy(task="multiclass", num_classes=num_classes)# Simulate predictions from the model_, predicted_simulated = torch.max(inputs_simulated.data, 1)# Update and compute the accuracy metricaccuracy_metric.update(predicted_simulated, true_labels)final_accuracy = accuracy_metric.compute()print(f"Simulated Accuracy: {final_accuracy * 100}%")
Integrate TorchMetrics directly into your test loop to evaluate the model on the entire test dataset. The code below sets the model to evaluation mode, disables gradient tracking, and then computes the overall accuracy.
Copy
Ask AI
import torchmetrics# Initialize the accuracy metric for a binary classification taskaccuracy_metric = torchmetrics.Accuracy(task="multiclass", num_classes=2)# Set the model to evaluation mode and disable gradient trackingmodel.eval()with torch.no_grad(): for i, data in enumerate(test_loader, 0): inputs, labels = data outputs = model(inputs) _, predicted = torch.max(outputs.data, 1) accuracy_metric.update(predicted, labels)# Compute and display the final test accuracyfinal_accuracy = accuracy_metric.compute()print(f"Test Accuracy: {final_accuracy * 100}%")
If you run multiple evaluations in succession, reset the metric using the reset() method before each new update to ensure accurate tracking.
In addition to TorchMetrics, you can implement a custom metric. The following example demonstrates how to calculate accuracy manually by comparing model predictions with true labels.
Copy
Ask AI
# Custom Accuracy Calculationcorrect = 0total = 0with torch.no_grad(): for i, data in enumerate(test_loader, 0): inputs, labels = data outputs = model(inputs) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item()# Print the custom computed accuracy percentageprint(f'Custom Accuracy: {100 * correct // total}%')
In this guide, we demonstrated the comprehensive process of model evaluation:
Data Preparation: Importing modules, setting up the transformation pipeline, and creating a custom dataset.
Model Loading: Initializing the network architecture and loading pre-trained weights.
Inference Techniques: Running both real-time single-image inference and batch inference.
Performance Metrics: Computing accuracy using both TorchMetrics and custom metric implementations to assess model performance.
While the demonstration produced an accuracy of around 49%—likely due to limited data and training epochs—the evaluation process detailed here can be extended to larger, more diverse datasets to ensure robust model performance prior to production deployment.Happy evaluating!