
-
Document labeling and model training
Learn how to label documents and annotate text spans for both classification and named-entity extraction. Labeling is the manual process of tagging documents or text fragments with the categories and entity types you want the model to learn. Those labeled examples form the training set for a custom machine-learning model. -
Performance evaluation
Learn to evaluate custom models with standard metrics such as precision, recall, and F1 score. These metrics quantify model behavior on held-out test data, reveal weaknesses (for example, poor recall on rare classes), and guide iterative improvements. -
Model deployment
Learn how to deploy a trained model as a REST API endpoint so your application can call it in real time to classify new documents or extract custom entities.
| Objective | Key activities | Deliverable / outcome | Example use case |
|---|---|---|---|
| Document labeling & training | Define labels, annotate examples, prepare datasets, run training | A trained custom model ready for evaluation | Classify invoices vs. contracts; extract medication names from clinical notes |
| Performance evaluation | Split data, calculate precision/recall/F1, analyze confusion matrix | Metrics and error analysis guiding data/model improvements | Identify low-performing classes and add more labeled examples |
| Model deployment | Create REST endpoint, secure access, monitor predictions | Production endpoint for real-time inference and monitoring | Integrate into ingestion pipeline to tag documents on arrival |
Use prebuilt text analytics features when they meet your needs (e.g., general entity recognition for common named entities). Choose custom models when you must detect domain-specific entities or apply organization-specific classifications that prebuilt models cannot capture.
- Labeling guidelines: tips to create high-quality, consistent annotations (for example: label spans consistently, define clear label definitions, and include edge cases).
- Balanced datasets: approaches to handle class imbalance such as targeted labeling, data augmentation, or sampling strategies.
- Iterative evaluation: how to use metrics and error analysis to prioritize where to add more labeled data or adjust modeling choices.
- Monitoring after deployment: methods for tracking model drift, collecting real-world feedback, and scheduling re-training.
- Introduction to Named Entity Recognition (NER) — conceptual overview of entity extraction.
- Evaluation metrics for classification — primer on precision, recall, and F1.
- Text analytics and custom model guidance — vendor documentation and examples for deploying text analytics solutions.