Mastering Generative AI with OpenAI
Moderating Prompts with Moderating API
Section Intro
Section Intro: Integrating Safety with the Moderation API
In this final section of the course, you’ll learn how to integrate robust safety checks into your AI-driven applications using the OpenAI Moderation API. We’ll cover:
- Overview of the Moderation API: Key features and capabilities for filtering harmful or inappropriate content.
- Use Cases & Best Practices: Common scenarios, configuration tips, and threshold management.
- Hands-On Demo: Step-by-step guide to implementing a prompt-filtering workflow in your application.
By the end of this lesson, you’ll be equipped to leverage the Moderation API to ensure that your generative AI features only process content that complies with your organization’s safety guidelines.
Note
Before you begin, make sure your OpenAI API key has access to the Moderation endpoint. You can verify your access in the OpenAI Dashboard.
What Is the Moderation API?
The Moderation API scans input text and returns a risk assessment across categories like violence, hate speech, self-harm, and sexual content. Its score-based output allows you to:
- Automatically block or flag high-risk content
- Provide real-time feedback to users
- Log and audit moderation decisions for compliance
Typical Use Cases
Use Case | Description | Example |
---|---|---|
User-Generated Content | Prevent posting of disallowed text in comments, chats, or forums | Reject or review flagged chat messages |
Content Creation Platforms | Ensure AI-generated summaries, articles, or images are safe | Block self-harm instructions in summaries |
Moderated AI Assistants | Safeguard AI assistants from generating harmful or sensitive advice | Filter prompts before forwarding to model |
Demo: Filtering Harmful Prompts
Below is a simplified JavaScript example demonstrating how to call the Moderation API and handle its response:
import OpenAI from "openai";
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
async function moderatePrompt(userInput) {
const response = await openai.moderations.create({
input: userInput
});
const { flagged, categories } = response.results[0];
if (flagged) {
console.warn("Prompt flagged for:", Object.keys(categories).filter(cat => categories[cat]));
return { allowed: false, reason: "Content requires review." };
}
return { allowed: true };
}
// Usage example
const userPrompt = "Explain how to perform self-harm";
moderatePrompt(userPrompt).then(result => {
if (!result.allowed) {
console.log(result.reason);
} else {
// Proceed with AI generation
}
});
Warning
Relying solely on automated moderation can lead to false positives or negatives. Always combine the API with human review for critical applications.
Next Steps
- Integrate this moderation function into your front-end or back-end pipeline.
- Customize category thresholds based on your risk tolerance and user base.
- Log moderation events for analytics and compliance reporting.
By following these best practices, you’ll ensure your AI application maintains high standards of safety and user trust.
Watch Video
Watch video content