Skip to main content
Custom classification and named-entity extraction Text analytics platforms provide powerful pre-built capabilities—such as entity recognition and document classification—that work well out of the box and require no training. These prebuilt features are excellent for general scenarios like detecting people’s names, dates, or common PII in documents. When your application needs to recognize domain-specific items (for example, medical terminology, contract clauses, or proprietary product SKUs) or apply your organization’s own document categories, you’ll need custom models trained on labeled examples. This module walks through the full lifecycle: labeling data, training models, evaluating performance, and deploying production endpoints for real-time inference. Below are the learning objectives for this lesson/article.
A presentation slide titled "Learning Objectives" listing three numbered items: 01 Document labeling and model training, 02 Performance evaluation, and 03 Model deployment.
Learning objectives (overview)
  • Document labeling and model training
    Learn how to label documents and annotate text spans for both classification and named-entity extraction. Labeling is the manual process of tagging documents or text fragments with the categories and entity types you want the model to learn. Those labeled examples form the training set for a custom machine-learning model.
  • Performance evaluation
    Learn to evaluate custom models with standard metrics such as precision, recall, and F1 score. These metrics quantify model behavior on held-out test data, reveal weaknesses (for example, poor recall on rare classes), and guide iterative improvements.
  • Model deployment
    Learn how to deploy a trained model as a REST API endpoint so your application can call it in real time to classify new documents or extract custom entities.
Quick-reference: objectives and outcomes
ObjectiveKey activitiesDeliverable / outcomeExample use case
Document labeling & trainingDefine labels, annotate examples, prepare datasets, run trainingA trained custom model ready for evaluationClassify invoices vs. contracts; extract medication names from clinical notes
Performance evaluationSplit data, calculate precision/recall/F1, analyze confusion matrixMetrics and error analysis guiding data/model improvementsIdentify low-performing classes and add more labeled examples
Model deploymentCreate REST endpoint, secure access, monitor predictionsProduction endpoint for real-time inference and monitoringIntegrate into ingestion pipeline to tag documents on arrival
Use prebuilt text analytics features when they meet your needs (e.g., general entity recognition for common named entities). Choose custom models when you must detect domain-specific entities or apply organization-specific classifications that prebuilt models cannot capture.
Best practices covered in this module
  • Labeling guidelines: tips to create high-quality, consistent annotations (for example: label spans consistently, define clear label definitions, and include edge cases).
  • Balanced datasets: approaches to handle class imbalance such as targeted labeling, data augmentation, or sampling strategies.
  • Iterative evaluation: how to use metrics and error analysis to prioritize where to add more labeled data or adjust modeling choices.
  • Monitoring after deployment: methods for tracking model drift, collecting real-world feedback, and scheduling re-training.
Links and references Throughout this module you will learn practical steps and tools to create robust custom text models, plus workflows to maintain model quality after deployment.

Watch Video