Implement Retrieval Augmented Generation (RAG) with Azure OpenAI Service
Working with Custom Data Sources
Guide to grounding Azure OpenAI with custom data sources using RAG, including chat playground, Blob Storage indexing, REST and SDK patterns, and a Python end to end example.
This guide explains how to ground Azure OpenAI responses with your own documents (Retrieval-Augmented Generation — RAG). It covers the Chat playground workflow for quickly testing data grounding, an end-to-end demo indexing a PDF from Azure Blob Storage, REST and SDK integration patterns, and a production-minded Python example.Why ground models with your data?
Prevent hallucinations by giving models verifiable context.
Surface organization-specific knowledge not present in base models.
Enable citations so answers include traceable sources.
Quick workflow: Chat playground (no code required)
The Azure OpenAI Studio chat playground provides a central UI for composing prompts, selecting deployments, and adding data sources so the assistant can reference documents you control. Use this for rapid iteration before implementing code.Steps to connect a data source from the Chat playground:
Open Chat playground in Azure OpenAI Studio.
Click Add your data.
Choose an existing data source (for example, Azure AI Search index) or create one from the dialog.
After adding, a new chat session is created and grounded in that data — the model can integrate content from your documents and cite sources.
This lets you move quickly from prototype to a deployment-ready configuration.
Scenario: You have Project_Orion_Confidential.pdf stored in Blob Storage and want the assistant to answer questions using the PDF content.
Before ingestion, the assistant will answer from general model knowledge. To enforce that responses only come from the indexed document content, set a system message telling the model not to guess. Example system instruction:
“Answer only if you find relevant content in the data source. Do not guess. If unsure, say: ‘I don’t have information on that topic.’”After adding that system message, queries about Project Orion will return “I don’t have information…” until you ingest and index the PDF.
In the Add data dialog choose Azure Blob Storage and point to the container with your file.
Select an Azure Cognitive Search resource — this service builds the index and returns semantically ranked documents to Azure OpenAI.
Provide an index name (for example, rag) and configure authentication (API key or managed identity).
Save and let the platform ingest and index the document. A status indicator shows ingestion progress; once complete, the chat session can return citations from the indexed file.
Select your search resource, choose authentication, and save. The platform handles ingestion and indexing; once indexing completes the chat assistant can cite document chunks when answering.
Once indexed, asking about Project Orion returns answers that include citations (for example: “Project Orion Confidential — Part One”) and specific content such as lead researcher names.
Each REST request to the Azure OpenAI endpoint for RAG-enabled interactions should include a data_sources array. This tells the model where to look for external content (for example an Azure Cognitive Search index).
Authentication for the data source is managed through the search resource (or other data service), not the Azure OpenAI resource. Ensure the access method you pick (API key or managed identity) is configured correctly and the identity has appropriate permissions.
Example RAG-enabled REST request body (replace placeholders with your values):
Copy
POST https://<your_openai_endpoint>/openai/deployments/<deployment>/chat/completions?api-version=<api_version>Content-Type: application/jsonapi-key: <your_api_key>{ "data_sources": [ { "type": "azure_search", "parameters": { "endpoint": "https://<your_search_endpoint>", "index_name": "<your_search_index>", "authentication": { "type": "system_assigned_managed_identity" }, "semantic_configuration": "default", "query_type": "simple", "top_n_documents": 5, "strictness": 3, "role_information": "Answer only if you find relevant content in the data source. Do not guess. If unsure, say: \"I don't have information on that topic.\"" } } ], "messages": [ { "role": "system", "content": "Answer only if you find relevant content in the data source. Do not guess. If unsure, say: \"I don't have information on that topic.\"" }, { "role": "user", "content": "Who are the lead researchers for Project Orion?" } ], "past_messages": 10, "temperature": 0.0, "max_tokens": 800}
Azure OpenAI SDKs (used with Azure OpenAI deployments) simplify integration in languages like Python and C#. Even when using the SDK you still supply a messages array and a data source object to indicate where to find ground-truth content.Typical steps:
Install the Azure OpenAI/OpenAI SDK for your language.
Create a client (API key or identity-based auth).
Define chat messages (system + user).
Attach a data_sources object (endpoint, index, authentication).
Send the request and process the response.
Supported data sources (SDK)
Azure AI Search — primary index for grounding (GA).
Azure Cosmos DB for MongoDB vCore — document-level grounding (preview).
More connectors are being added — check Azure release notes for updates.
Table: Common data-source options
Resource Type
Typical Use Case
Availability
Azure AI Search
Semantic indexes for document retrieval and ranking
Generally available
Azure Blob Storage
Raw documents (PDF, DOCX) used by search indexer
Works with search indexer
Azure Cosmos DB (MongoDB vCore)
Document-level grounding (preview)
Preview — check release notes
Tip: pick a connector that matches where your documents live and your required retrieval semantics.
When to let the service retrieve vs. app-side retrieval
Two common patterns:
Service-side retrieval: include data_sources in the API call and let Azure OpenAI + Azure Cognitive Search handle retrieval + generation. Simpler; less client code.
App-side retrieval: query Cognitive Search from your app, select top-K docs, and include the content in messages. Offers more control over retrieval, filtering, and privacy.
When integrating data, choose between (a) letting the OpenAI service call your search index via the data_sources parameter in the API, or (b) performing retrieval in your application (query search), then sending the retrieved content as context. Both approaches are valid—pick one that meets your latency, cost, and security requirements.
Authentication for data sources is tied to the search/data resource, not the Azure OpenAI resource. Ensure the identity (API key or managed identity) you configure has appropriate permissions to access the search index or storage.
This simplified Python example shows one common pattern:
Query Azure Cognitive Search to get top-K documents.
Format these documents into a context.
Call Azure OpenAI chat completions with messages containing the context.
Print the assistant response.
Replace placeholders with your environment values and use secure credential management in production.
Copy
# app.pyimport osfrom openai import AzureOpenAIfrom azure.core.credentials import AzureKeyCredentialfrom azure.search.documents import SearchClient# Configuration (use environment variables in production)AZURE_OPENAI_ENDPOINT = os.getenv("AZURE_OPENAI_ENDPOINT", "https://<your_openai_endpoint>")AZURE_OPENAI_API_KEY = os.getenv("AZURE_OPENAI_API_KEY", "<your_openai_api_key>")DEPLOYMENT_NAME = os.getenv("AZURE_OPENAI_DEPLOYMENT", "gpt-4o")SEARCH_ENDPOINT = os.getenv("SEARCH_ENDPOINT", "https://<your_search_endpoint>")SEARCH_KEY = os.getenv("SEARCH_KEY", "<your_search_key>")SEARCH_INDEX_NAME = os.getenv("SEARCH_INDEX", "rag")# Initialize Azure OpenAI client (key-based auth example)client = AzureOpenAI( api_key=AZURE_OPENAI_API_KEY, azure_endpoint=AZURE_OPENAI_ENDPOINT, api_version="2024-02-15-preview")def query_search_service(query_text, top_n=5): """Query Azure Cognitive Search and return top N documents.""" search_client = SearchClient( endpoint=SEARCH_ENDPOINT, index_name=SEARCH_INDEX_NAME, credential=AzureKeyCredential(SEARCH_KEY) ) results = search_client.search(query_text, top=top_n) docs = [] for r in results: # r is a SearchResult; r.document contains the indexed fields docs.append(r.document) return docsdef format_documents_for_prompt(documents): """Convert search documents to a single context string for the model.""" parts = [] for i, doc in enumerate(documents, start=1): # Adjust fields according to your index schema. Example uses 'title' and 'content' title = doc.get("title", f"Document {i}") content = doc.get("content", "") parts.append(f"Source: {title}\n{content}\n") return "\n---\n".join(parts)def ask_question_with_rag(question): # 1) Retrieve relevant documents docs = query_search_service(question, top_n=5) if not docs: return "I don't have information on that topic." # 2) Prepare context from documents context_block = format_documents_for_prompt(docs) # 3) Prepare messages (system instructs to only answer with supporting evidence) messages = [ { "role": "system", "content": "Answer only if you find supporting content in the provided data. Do not guess. If unsure, say: \"I don't have information on that topic.\"" }, { "role": "user", "content": question }, { "role": "system", "content": f"Context documents:\n{context_block}" } ] # 4) Send request to Azure OpenAI (RAG enabled via explicit context handling) response = client.chat.completions.create( model=DEPLOYMENT_NAME, messages=messages, max_tokens=800, temperature=0.0 ) # Extract the assistant message assistant_msg = response.choices[0].message["content"] return assistant_msgdef main(): print("Azure OpenAI RAG Demo") print("----------------------") # Example question (replace with your prompt) question = "Who are the lead researchers for Project Orion?" print(f"\nQuestion: {question}") print("\nRetrieving information and generating answer...") answer = ask_question_with_rag(question) print("\nAnswer:") print(answer)if __name__ == "__main__": main()
Note: This example demonstrates one pattern—client-side retrieval and context injection. The REST/SDK approaches also support a data_sources parameter so the service performs retrieval alongside generation. Choose the approach that best fits your architecture and security requirements.Example console output (expected behavior)
If documents are indexed and retrieved, the assistant responds with a supported answer and cites the source.
If no relevant documents are found, the assistant replies: “I don’t have information on that topic.”
Example:
Copy
Azure OpenAI RAG Demo----------------------Question: Who are the lead researchers for Project Orion?Retrieving information and generating answer...Answer:The lead researchers for Project Orion are:- Dr. Eliza Tran (AI Systems Architect)- Major Samuel Drake (Defense Operations Liaison)- Ava Kohli (Azure Systems Engineer)
Ensure Azure Cognitive Search and its authentication method (API key, managed identity) are configured properly — data-source auth is tied to the search resource.
Use clear system messages to constrain the assistant and reduce hallucinations.
Choose between service-side retrieval (data_sources parameter) and app-side retrieval (explicit queries) based on latency, cost, and security trade-offs.
Monitor Azure release notes for new data-source connectors and SDK updates.
This workflow demonstrates how to ground Azure OpenAI responses with your own documents, helping you move from experiments to robust RAG-enabled applications.