Skip to main content
Custom Named Entity Recognition (Custom NER) extends pre-built entity extraction by teaching a model to detect domain-specific tokens such as product IDs, contract numbers, SKUs, clause types, brand names, currency pairs, or other business-specific labels. When your use case requires precise identification of specialized entities that general models do not cover, Custom NER is the solution. This guide follows the typical Custom NER lifecycle and shows how the process appears in Azure AI Language Studio, including data connection, labeling, training, deployment, and evaluation.

High-level workflow

StepPurposeOutcome
Connect dataPoint Language Studio to your document repository (Azure Storage)Files available for annotation and training
Define entitiesCreate the labels your model should learn (e.g., PRODUCT_ID, CLAUSE_TYPE, FIN.CRYPTO)Entity schema used for annotation and model output
Annotate documentsManually tag text spans with entity labelsLabeled dataset for supervised training
Train modelBuild a Custom NER model on labeled data (train/test split)Trained model ready for evaluation
Deploy & testCreate an endpoint for real-time inferenceIntegrated model endpoint for applications
Evaluate & iterateInspect precision, recall, F1 and confusion patterns; add more labels or examplesImproved, production-ready model after iteration

1) Connect data and define entities

Custom NER projects typically begin by giving Language Studio access to your documents stored in an Azure Storage container. Next, define the set of entities (labels) you want the model to recognize — for example: CITY, PRODUCT_ID, CLAUSE_TYPE, FIN.CRYPTO, FIN.FOREX, FIN.STOCK. These labels form the annotation schema used during manual labeling.

2) Annotate documents (labeling)

Manual annotation is where you select spans of text in sample documents and assign the corresponding label. Consistent, high-quality annotations across varied contexts are the strongest predictor of a reliable Custom NER model. Here, the terms “Seattle” and “San Francisco” are being labeled as City entities in the labeling UI:
A screenshot of a "Custom Named Entity Recognition" data-labeling interface in Language Studio, showing a document with city names being annotated. "San Francisco" is selected for labeling and "Seattle" is tagged as a City in the activity pane.
High-quality, consistent annotations are the most important factor for good Custom NER performance. Include varied contexts and edge cases in your labeled data (abbreviations, punctuation, casing, and tokens that look similar but belong to different labels).
In Language Studio, the Extract Information area lists multiple extractors. For custom scenarios, choose the Custom Named Entity project flow to begin building a Custom NER project.
A screenshot of the Azure AI Language Studio web interface showing projects and a toolbar. The main area displays cards for text-extraction features like Extract PII, Extract key phrases, Find linked entities, Extract named entities, and Extract health information.

3) Project setup and sample files

When creating a new Language Studio project you may need to connect a storage account and grant permissions. Provide a project name, choose the primary language (e.g., English), and select the storage container with your sample files. Example: a container named “Custom NER” with sample text files used for labeling and training:
A screenshot of the Microsoft Azure portal showing a storage container called "customner" with a list of blob files (e.g., sample_1.txt, sample_2.txt, etc.) and one CustomText file. The table shows each blob's modified timestamp, access tier (Hot), blob type (Block blob), size and lease state.

4) Data labeling view and entity creation

After files are loaded into the project, open the Data Labeling view. Create entity labels (for example: crypto, forex, stocks) and annotate documents by selecting spans and assigning the correct label. Save labels frequently to ensure training data is preserved and ready for model building.
A screenshot of the Azure AI Language Studio "Data labeling" page showing a list of sample text documents (sample_1.txt, sample_2.txt, etc.) ready for annotation and an Activity pane on the right for labels and entities. The left sidebar displays project navigation options like Auto-labeling, Training jobs, and Project settings.
Example annotations for a financial NER project:
  • “Google”, “Apple” → stocks
  • “Bitcoin”, “BTC” → crypto
  • “USDJPY”, “AUDJPY” → forex (include variations and punctuation)
Save and review labels before training. If the dataset is small, add additional annotated examples for underrepresented tokens.

5) Train the Custom NER model

Create a training job in Language Studio, choose the labeled dataset and a train/test split (commonly 80/20), give the job a name, and start training. Training duration depends on dataset size and model configuration.
A screenshot of the Azure AI Language Studio "Training jobs" page showing the Start a training job form for a custom named entity recognition project. It shows options to train a new model (name field), an 80/20 training/testing data split, and a highlighted "Train" button.

6) Evaluate, deploy, and iterate

After training, Language Studio will surface evaluation metrics such as precision, recall, and F1 score, including per-entity breakdowns. Inspect confusion patterns — for instance, if the model mistakenly classifies uncommon currency pairs as stocks, add labeled examples for those pairs and retrain. If metrics meet your requirements, deploy the trained model to create an inference endpoint (for example, named FinDep). You can then test predictions directly in Language Studio or call the endpoint programmatically from your application. Notes from common demos:
  • Tokens that appear frequently in training data (e.g., BTC) are often recognized correctly.
  • Unseen or rare tokens (e.g., an uncommon currency pair) may be misclassified; collect and label more examples for those tokens to improve accuracy.

Best practices for production-ready Custom NER

  • Label consistently and define a clear annotation guideline for all annotators.
  • Include diverse examples: casing, punctuation, abbreviations, and context variations.
  • Ensure balanced representation of each entity; use augmentation if needed.
  • Monitor per-entity metrics and confusion matrices to find weak spots.
  • Iterate: add challenging examples and retrain until performance stabilizes.
  • Use an 80/20 or 70/30 train/test split to validate generalization and avoid overfitting.

Summary

  • Custom NER trains models to detect domain-specific entities that pre-built extractors cannot.
  • Key steps: connect data, define entities, annotate documents, train, deploy, evaluate, and iterate.
  • High-quality, varied annotations and iterative retraining drive production-grade accuracy.

Watch Video