Custom Text Classification

Custom Text Classification This guide walks through the end-to-end process for building a custom text classification model in Azure Language Studio — from connecting your data to deploying a trained model for inference. Follow these steps to create accurate, production-ready classifiers for scenarios like article tagging, support-ticket routing, or internal document classification. Overview — pipeline steps

Data connection: connect your stored documents to Language Studio.
Class definition: define the set of labels (single-label or multi-label).
Label assignment: tag documents with the appropriate class labels.
Model training: train and evaluate the classifier.
Deployment: deploy the model as an endpoint for integration.

A presentation slide titled "Custom Text Classification" showing two steps: Data Connection and Class Definition. On the right is a screenshot of an Azure data-labeling interface listing example documents and labels like Sports, News, and Entertainment.

Step 1 — Data connection and class definition Start by integrating the documents you want to classify (e.g., news articles, support tickets, internal docs) with Azure Language Studio. Clean, relevant data improves model quality, so filter out noisy files and ensure the content represents the categories you intend to predict. Next, define your classes — these are the categories the model will predict. Examples in this article include Arts, Entertainment, and Sports. Add enough representative examples per class to enable effective learning. You can add, rename, or remove classes later in the project settings. Step 2 — Label assignment Assign labels to each document to build the supervised dataset used for training. Label consistently: similar content should receive the same tag (e.g., a match report → Sports; a movie review → Entertainment). The portal shows dataset status and whether files are assigned to training or test sets. Proper and consistent labeling is critical to good results. Step 3 — Training and evaluation After labeling, start a training job. Choose a data split (commonly 80/20 or 70/30) to hold out evaluation data and monitor generalization. The training process learns patterns that map text to classes and provides an evaluation report on the held-out set to estimate expected performance. Step 4 — Deployment When satisfied with evaluation metrics, deploy the model as an endpoint (REST + SDK support). Choose a deployment name and resource region. Deployed models become callable from your applications for real-time or batch classification.

A presentation slide titled "Custom Text Classification" showing three numbered steps—03 Label Assignment, 04 Model Training, and 05 Deployment—with short descriptions for each. The slide also notes it supports both single- and multi-label classification.

Using Language Studio for Custom Text Classification This section provides a practical walkthrough inside Azure Language Studio. Open Language Studio and choose “Classify text (Custom text classification).” The studio includes pre-built features (Analyze sentiment, Detect language), but here we focus on creating a custom classification project.

A browser screenshot of the Azure AI Language Studio dashboard showing a list of projects and menu tabs (like Classify text, Extract information, Summarize text). The page also displays feature cards for Analyze sentiment, Detect language, and Custom text classification along with learning resources.

Create a project

Click “Create a project”.
Select the Azure Language resource to back the project.
If this is your first time, attach a storage account (documents used for labeling are uploaded to blob storage). Grant the storage account an RBAC role such as Storage Blob Data Contributor so Language Studio can read the blobs.

Ensure the storage account has the Storage Blob Data Contributor role (or equivalent) assigned so Language Studio can access files for labeling and training.

A dialog window from Azure AI Studio titled "Select an Azure resource," showing form fields to choose an Azure directory, subscription, resource type (Language) and resource name. The modal overlays the project creation screen with Cancel/Done buttons.

During creation you choose:

single-label classification (one category per document) or
multi-label classification (documents can belong to multiple categories).

In the example here, we select single-label and name the project “Article Classification,” with English as the primary language. Also select the target storage container where labeled files reside.

Screenshot of the Microsoft Azure portal showing the contents of a storage container named "textclass." The view lists multiple .txt blob files and their properties (modified date, access tier, blob type, size, lease state).

Choose the container and finish the project setup.

Label the data Add classes (e.g., Arts, Entertainment, Sports) and begin labeling each document. The portal indicates dataset readiness and shows segmentation into training and test sets. For meaningful performance, use as many varied, labeled examples as possible — dozens to hundreds per label is common for higher accuracy.

A screenshot of the Azure Language Studio "Data labeling" page showing a list of document files (e.g., arts1.txt, entertainment1.txt, sports1.txt) with their assigned labels and dataset status. The right-side activity pane shows label options and controls for assigning documents to training or test sets.

Auto-labeling (preview) Language Studio can suggest labels using auto-labeling (generative models) to help bootstrap large datasets. Always review and correct auto-labeled items before training to avoid propagating errors.

Screenshot of the Azure AI Language Studio "Auto-labeling" page for a text classification project, showing a central illustration and a message that no auto-labeling jobs exist yet. The left sidebar displays project navigation items like Data labeling, Training jobs, and Model performance.

Start a training job

Choose a model name.
Configure data split (for example, 80% training / 20% testing).
Start training and wait for completion; review evaluation metrics and logs on the Training jobs page.

Example: calling the model from Python (SDK) Below is a compact Python example showing how to classify a local text file using the Text Analytics client. Note: import only the client and AzureKeyCredential — some older examples import non-existent action classes (e.g., SingleCategoryClassifyAction), which will raise ImportError.

# example: classify a local document using the Text Analytics SDK
from azure.ai.textanalytics import TextAnalyticsClient
from azure.core.credentials import AzureKeyCredential

def classify_local_document(file_path, endpoint, key, project_name, deployment_name):
    # Initialize client
    client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key))

    # Read the document to classify
    with open(file_path, "r", encoding="utf-8") as f:
        document_text = f.read()

    # Start the classification job (single-label in this example)
    poller = client.begin_single_label_classify(
        documents=[document_text],
        project_name=project_name,
        deployment_name=deployment_name
    )

    # Wait for result
    results = poller.result()

    # results is an iterable of document results
    for doc in results:
        if not doc.is_error:
            # The SDK may expose classifications in either 'classifications' or 'classification'
            classifications = getattr(doc, "classifications", None) or getattr(doc, "classification", None)
            if classifications:
                # If it's a list of classifications, print them
                if isinstance(classifications, list):
                    for c in classifications:
                        print(f"Predicted Label: {c.category}")
                        print(f"Confidence Score: {c.confidence_score:.2f}")
                else:
                    # single classification object
                    print(f"Predicted Label: {classifications.category}")
                    print(f"Confidence Score: {classifications.confidence_score:.2f}")
            else:
                print("No classifications returned for this document.")
        else:
            print(f"Error: {doc.error.code} - {doc.error.message}")

If you see an ImportError such as:

ImportError: cannot import name 'SingleCategoryClassifyAction' from 'azure.ai.textanalytics'

it means the code tried to import a non-existent action class. Import only TextAnalyticsClient and AzureKeyCredential and call the classification method on the client, as shown above.

Deploy the model After successful training and evaluation:

Add a deployment (give it a name and choose region/resource).
The portal will provide a prediction URL and SDK code snippets to call the endpoint.

Screenshot of Azure Language Studio's "Deploying a model" page with an "Add deployment" dialog open, showing a new deployment name ("article-depl") being entered. The modal also shows fields to assign a trained model and choose deployment regions.

Screenshot of the Azure AI Language Studio "Deploying a model" page showing a deployment named "article-dep" (model article-trn-job) with a highlighted "Get prediction URL" button and deployment details. The left sidebar shows project navigation and the page includes SDK and GitHub sample links.

Test from the portal Language Studio includes a quick test UI. Example results from the demo:

Input: “Argentina won the 2022 FIFA World Cup.”
Output: Predicted “Sports” (confidence ~0.39)
Input: “Avengers is a great movie.”
Output: Predicted “Entertainment”

The portal also displays raw JSON output for predictions. Example single-label JSON:

{
  "classes": [
    {
      "category": "Entertainment",
      "confidenceScore": 0.37
    }
  ]
}

Improving accuracy If your model has low confidence (common with small datasets), apply these best practices:

Increase dataset size: add more labeled examples per class with diverse phrasing.
Label quality: ensure labels are consistent and representative of real inputs.
Data split: use a validation/test split to detect overfitting.
Auto-labeling: bootstrap with auto-labeling, then review and correct suggestions.
Domain examples: include domain-specific vocabulary and realistic documents.

Use cases Custom text classification is useful for:

Routing support tickets automatically
Tagging news and articles
Classifying legal or HR documents (NDAs, contracts)
Content moderation and internal document organization

Quick reference table

Resource Type	Use Case	Example / Command
Language Studio	Create and manage projects	https://learn.microsoft.com/azure/ai-services/language/
Storage account	Store labeled documents	Assign Storage Blob Data Contributor role
Deployment	Host trained model	Get prediction URL from Language Studio
SDK (Python)	Call endpoint programmatically	azure.ai.textanalytics.TextAnalyticsClient

Links and references

Azure Language documentation: https://learn.microsoft.com/azure/ai-services/language/
Azure Language Studio overview: https://learn.microsoft.com/azure/ai-services/language/overview
Azure Storage RBAC roles: https://learn.microsoft.com/azure/storage/common/storage-auth-aad-roles

Train with domain-specific examples and iterate — more high-quality labeled data and consistent labeling practices yield the best classification performance.

Watch Video