I found this documentation about chatting with your own data using AI-Search & OpenAI.
It works fine for my data however I don't get any additional context aside from the content and the score:
{"content":"<MY CONTENT>", "id":null,"title":null,"filepath":null,"url":null,"metadata":{"chunking":"orignal document size=2000. Scores=3.6962261Org Highlight count=31."},"chunk_id":"0"}
I think the additional fields in AI-Search need to be specified somewhere in the code but I don't know where and I couldn't find any example for it.
In the Azure OpenAI Chat Playground you can select the fields within your AI-Search Index. And then it is also correctly displayed in the sample chat app.
How can I achieve the same in my code using the code example referenced above?
I found the solution myself. So it turns out that you do not need to use the 'default' names for your AI-Search index fields. You can name your index fields whatever you want. However you need to map your field names to the expected default. Here is a working example:
def ask_llm_citation(USER_INPUT:str, AZURE_OPENAI_SYSTEM_MESSAGE: str, NR_DOCUMENTS: int, STRICTNESS: int):
def parse_multi_columns(columns: str) -> list:
if "|" in columns:
return columns.split("|")
else:
return columns.split(",")
endpoint = config["OPENAI_API_BASE"]
api_key = config["OPENAI_API_KEY"]
# set the deployment name for the model we want to use
deployment = config["OPENAI_API_GPT_DEPLOYMENT_NAME"]
client = openai.AzureOpenAI(
base_url=f"{endpoint}/openai/deployments/{deployment}/extensions",
api_key=api_key,
api_version="2023-09-01-preview"
)
response = client.chat.completions.create(
messages=[{"role": "user", "content": USER_INPUT}],
model=deployment,
extra_body={
"dataSources": [
{
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": ai_search["AZURE_COGNITIVE_SEARCH_ENDPOINT"],
"key": ai_search["AZURE_COGNITIVE_SEARCH_KEY"],
"indexName": ai_search["AZURE_COGNITIVE_SEARCH_INDEX_NAME"],
"fieldsMapping": {
"contentFields": parse_multi_columns("content"),
"urlField": "url_name",
"filepathField": "file_name",
"vectorFields": parse_multi_columns("content_vector")
},
"embeddingDeploymentName": config["OPENAI_API_DEPLOYMENT_NAME"],
"query_type":"vectorSimpleHybrid",
"inScope": True,
"roleInformation": AZURE_OPENAI_SYSTEM_MESSAGE,
"topNDocuments": NR_DOCUMENTS,
"strictness": STRICTNESS
}
}
]
},
stream=True,
)
for chunk in response:
delta = chunk.choices[0].delta
yield delta
Note: ContentFields and VectorFields need to be a list and not a string as multiple fields are possible here. That is why we need to convert it to a list.