Module Introduction

Custom classification and named-entity extraction Text analytics platforms provide powerful pre-built capabilities—such as entity recognition and document classification—that work well out of the box and require no training. These prebuilt features are excellent for general scenarios like detecting people’s names, dates, or common PII in documents. When your application needs to recognize domain-specific items (for example, medical terminology, contract clauses, or proprietary product SKUs) or apply your organization’s own document categories, you’ll need custom models trained on labeled examples. This module walks through the full lifecycle: labeling data, training models, evaluating performance, and deploying production endpoints for real-time inference. Below are the learning objectives for this lesson/article.

A presentation slide titled "Learning Objectives" listing three numbered items: 01 Document labeling and model training, 02 Performance evaluation, and 03 Model deployment.

Learning objectives (overview)

Document labeling and model training
Learn how to label documents and annotate text spans for both classification and named-entity extraction. Labeling is the manual process of tagging documents or text fragments with the categories and entity types you want the model to learn. Those labeled examples form the training set for a custom machine-learning model.
Performance evaluation
Learn to evaluate custom models with standard metrics such as precision, recall, and F1 score. These metrics quantify model behavior on held-out test data, reveal weaknesses (for example, poor recall on rare classes), and guide iterative improvements.
Model deployment
Learn how to deploy a trained model as a REST API endpoint so your application can call it in real time to classify new documents or extract custom entities.

Quick-reference: objectives and outcomes

Objective	Key activities	Deliverable / outcome	Example use case
Document labeling & training	Define labels, annotate examples, prepare datasets, run training	A trained custom model ready for evaluation	Classify invoices vs. contracts; extract medication names from clinical notes
Performance evaluation	Split data, calculate precision/recall/F1, analyze confusion matrix	Metrics and error analysis guiding data/model improvements	Identify low-performing classes and add more labeled examples
Model deployment	Create REST endpoint, secure access, monitor predictions	Production endpoint for real-time inference and monitoring	Integrate into ingestion pipeline to tag documents on arrival

Use prebuilt text analytics features when they meet your needs (e.g., general entity recognition for common named entities). Choose custom models when you must detect domain-specific entities or apply organization-specific classifications that prebuilt models cannot capture.

Best practices covered in this module

Labeling guidelines: tips to create high-quality, consistent annotations (for example: label spans consistently, define clear label definitions, and include edge cases).
Balanced datasets: approaches to handle class imbalance such as targeted labeling, data augmentation, or sampling strategies.
Iterative evaluation: how to use metrics and error analysis to prioritize where to add more labeled data or adjust modeling choices.
Monitoring after deployment: methods for tracking model drift, collecting real-world feedback, and scheduling re-training.

Links and references

Introduction to Named Entity Recognition (NER) — conceptual overview of entity extraction.
Evaluation metrics for classification — primer on precision, recall, and F1.
Text analytics and custom model guidance — vendor documentation and examples for deploying text analytics solutions.

Throughout this module you will learn practical steps and tools to create robust custom text models, plus workflows to maintain model quality after deployment.

Watch Video

Training Testing Publishing and Reviewing

Custom Text Classification