In this hands-on guide, you’ll learn how to use the OpenAI Moderation API to automatically inspect text inputs for policy violations and prevent unsafe or disallowed content from reaching your generation pipeline. By embedding a quick moderation check before calling the generation endpoint, you can maintain safe, compliant, and high-quality interactions with large language models.Documentation Index
Fetch the complete documentation index at: https://notes.kodekloud.com/llms.txt
Use this file to discover all available pages before exploring further.
Table of Contents
- Installation & Setup
- Basic Moderation Check
- Understanding the Moderation Response
- Handling Policy Violations
- Example: Self-Harm Prompt
- Best Practices & Summary
1. Installation & Setup
Before you start, ensure you have Python 3.7+ installed.Set your API key in an environment variable to keep it secure:
2. Basic Moderation Check
Start with a harmless prompt (e.g., a travel itinerary). This step ensures you only proceed with clean inputs.3. Understanding the Moderation Response
The Moderation API returns a JSON object with three key parts:| Field | Type | Description |
|---|---|---|
flagged | boolean | true if any category violates policy |
categories | object | Which violation categories were triggered (booleans) |
category_scores | object | Confidence scores for each category |
If
flagged is false, you can skip deep inspection and call the generation API directly.4. Handling Policy Violations
When the API flags a prompt (flagged: true), you should:
- Identify the highest-scoring category from
category_scores. - Inform or sanitize user input.
- Log or audit the incident for compliance.
5. Example: Self-Harm Prompt
Let’s test a harmful input:Always stop the generation pipeline when
Example handling:
flagged: true to prevent unsafe content.Example handling:
6. Best Practices & Summary
- Pre-filter all user inputs with the Moderation API before any generation call.
- Log flagged prompts along with category scores for audit and tuning.
- Gracefully inform users when their request is disallowed.
- Keep your API key secure using environment variables or a secrets manager.