I struggle understanding what are the pros and cons of each one of these approaches for implementing a RAG using Azure OpenAI with AI Search as source, with Python SDK. Both work well, but option B looks much cleaner. Why should we even bother doing all the steps in option A ourselves? Am I missing something?
I can only thing about some use cases where you need to get the AI Search chunks for evaluation (RAGAS), that might not possible with option B.
A) Querying AI Search yourself
openai_client = AzureOpenAI(
api_version="2024-06-01",
azure_endpoint=AZURE_OPENAI_ACCOUNT,
azure_ad_token_provider=token_provider
)
search_client = SearchClient(
endpoint=AZURE_SEARCH_SERVICE,
index_name="hotels-sample-index",
credential=credential
)
GROUNDED_PROMPT="""
You are a friendly assistant that recommends hotels based on activities and amenities.
Query: {query}
Sources:\n{sources}
"""
query="Can you recommend a few hotels with complimentary breakfast?"
search_results = search_client.search(
search_text=query,
top=5,
select="Description,HotelName,Tags"
)
sources_formatted = "\n".join([f'{document["HotelName"]}:{document["Description"]}:{document["Tags"]}' for document in search_results])
response = openai_client.chat.completions.create(
messages=[
{
"role": "user",
"content": GROUNDED_PROMPT.format(query=query, sources=sources_formatted)
}
],
model=AZURE_DEPLOYMENT_MODEL
)
B) Letting Azure OpenAI query AI Search itself
endpoint = os.environ.get("AZURE_OPENAI_ENDPOINT")
api_key = os.environ.get("AZURE_OPENAI_API_KEY")
deployment = os.environ.get("AZURE_OPENAI_DEPLOYMENT_ID")
client = openai.AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2024-02-01",
)
completion = client.chat.completions.create(
model=deployment,
messages=[
{
"role": "user",
"content": "What are my available health plans?",
},
],
extra_body={
"data_sources":[
{
"type": "azure_search",
"parameters": {
"endpoint": os.environ["AZURE_AI_SEARCH_ENDPOINT"],
"index_name": os.environ["AZURE_AI_SEARCH_INDEX"],
"authentication": {
"type": "api_key",
"key": os.environ["AZURE_AI_SEARCH_API_KEY"],
}
}
}
],
}
)
These are many approaches you can do with Azure OpenAI and AI Search, from your option A and B, it falls under:
A, Retrieve Then Read: Simple retrieve-then-read implementation, using the AI Search and OpenAI APIs directly. It first retrieves top documents from search, then constructs a prompt with them, and then uses OpenAI to generate an completion (answer) with that prompt. Read more here: https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/app/backend/approaches/retrievethenread.py
B, Chat Read Retrieve Read: A multi-step approach that first uses OpenAI to turn the user's question into a search query, then uses Azure AI Search to retrieve relevant documents, and then sends the conversation history, original user question, and search results to OpenAI to generate a response.
Function Calling
like this sample implementation https://github.com/Azure-Samples/azure-search-openai-demo/blob/main/app/backend/approaches/chatreadretrieveread.py. You can leverage further with multi-step depends on your business, not only just data_source but also for your python function https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/function-callingWhat's better?
A is simple and straightforward. If you search for an item and AI Search returns no information, then OpenAI takes no further action, and the conversation ends early. This often happens because not every user asks for information in the first query or may not know how to phrase their question.
B helps expand the conversation context and allows OpenAI to decide which function to run, making the interaction feel more human-like. It absolutely depends on your business needs to branch the conversation scenario in more customizable ways. For example, when a user asks, "How's the weather today?", it's necessary to have two parameters: "location" and "unit" (Celsius or Fahrenheit). Without providing enough parameters, OpenAI will prompt the user with something like, "Please let me know your location and unit." It will keep asking if either parameter is missing and will run the function once it has all the necessary information.