Azure OpenAI Service: implement RAG with Azure AI Search

João Barros 15 de October de 2024 1 min read

Retrieval-Augmented Generation (RAG) is the architectural pattern for building AI assistants that answer based on organization-specific knowledge. It combines Azure OpenAI Service (text generation) with Azure AI Search (relevant document retrieval).

RAG architecture

1. Ingestion (offline):
   Documents → Chunking → Embedding (text-embedding-ada-002) → AI Search Index

2. Query (runtime):
   User question
     → Question embedding
     → AI Search (vector + keyword search) → Top-K relevant chunks
     → Prompt: "Based on these documents: {chunks} — answer: {question}"
     → Azure OpenAI GPT-4o → Grounded answer

Index documents in AI Search

import os
from azure.search.documents import SearchClient
from azure.search.documents.indexes import SearchIndexClient
from openai import AzureOpenAI

openai_client = AzureOpenAI(
    api_key=os.environ["AZURE_OPENAI_KEY"],
    azure_endpoint=os.environ["AZURE_OPENAI_ENDPOINT"],
    api_version="2024-02-01"
)

def get_embedding(text: str) -> list[float]:
    response = openai_client.embeddings.create(
        model="text-embedding-ada-002",
        input=text
    )
    return response.data[0].embedding

# For each document chunk:
search_client.upload_documents(documents=[{
    "id": chunk_id,
    "content": chunk_text,
    "embedding": get_embedding(chunk_text),
    "source": document_path
}])

Full RAG query

def rag_query(question: str) -> str:
    # 1. Question embedding
    q_embedding = get_embedding(question)

    # 2. Retrieve relevant chunks (vector search)
    results = search_client.search(
        search_text=question,
        vector_queries=[VectorizedQuery(vector=q_embedding, k_nearest_neighbors=5, fields="embedding")],
        select=["content", "source"]
    )
    context = "\n\n".join([r["content"] for r in results])

    # 3. Generate the answer with GPT-4o
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer only based on the provided documents. Cite the sources."},
            {"role": "user",   "content": f"Documents:\n{context}\n\nQuestion: {question}"}
        ]
    )
    return response.choices[0].message.content

Conclusion

RAG with Azure OpenAI + AI Search is the pattern for enterprise assistants that need answers grounded in internal documents. It is more reliable than fine-tuning for knowledge bases that change frequently, and more controllable than letting the model use its base training.