Mastering Generative AI with OpenAI

Moderating Prompts with Moderating API

Moderation API

The OpenAI Moderation API helps you detect policy-violating, harmful, or unsafe content in user inputs before sending them to a language model. Integrating this check early in your pipeline ensures compliance, protects end users, and maintains the integrity of your application.

How the Moderation Endpoint Works

When you submit a prompt to the Moderation API, it returns a JSON payload with three primary sections:

FieldTypeDescription
flaggedbooleantrue if any policy violation is detected; false otherwise
categoriesobjectA map of violation categories (e.g., hate, self_harm) to boolean
category_scoresobjectConfidence scores (0.0–1.0) for each category

Example Request

curl https://api.openai.com/v1/moderations \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Your text to check for policy violations"
  }'

Example Response

{
  "id": "modr-XXXXX",
  "model": "text-moderation-004",
  "results": [
    {
      "flagged": false,
      "categories": {
        "hate": false,
        "harassment": false,
        "self_harm": false
      },
      "category_scores": {
        "hate": 0.01,
        "harassment": 0.02,
        "self_harm": 0.00
      }
    }
  ]
}

Note

Use the confidence values in category_scores to prioritize human review of borderline cases.

Integrating Moderation into Your Application Workflow

Adopt a secure, four-step flow to vet user inputs before content generation:

  1. Receive the user prompt.
  2. Call the Moderation API.
    • If flagged is true, return an error:
      “Your request violates our content policy and cannot be processed.”
    • If flagged is false, continue.
  3. Invoke the Generation API.
  4. Return the generated response to the end user.
# Pseudocode example
def handle_user_prompt(prompt):
    mod_result = call_moderation_api(prompt)
    if mod_result["flagged"]:
        raise PolicyError("Content policy violation detected.")
    return call_generation_api(prompt)

Warning

Always enforce the moderation step. Skipping it may expose your system to disallowed or harmful content.

Best Practices

  • Batch multiple inputs in a single moderation request to reduce latency.
  • Monitor and log flagged inputs for auditing and continuous policy tuning.
  • Adjust internal thresholds based on category_scores trends to minimize false positives.

Resources and References

Watch Video

Watch video content

Previous
Section Intro