python azure-cognitive-services openai-api azure-openai

How to get proper context from AI-Search documents when calling OpenAI completion endpoint

I found this documentation about chatting with your own data using AI-Search & OpenAI.

It works fine for my data however I don't get any additional context aside from the content and the score:

{"content":"<MY CONTENT>", "id":null,"title":null,"filepath":null,"url":null,"metadata":{"chunking":"orignal document size=2000. Scores=3.6962261Org Highlight count=31."},"chunk_id":"0"}

I think the additional fields in AI-Search need to be specified somewhere in the code but I don't know where and I couldn't find any example for it.

In the Azure OpenAI Chat Playground you can select the fields within your AI-Search Index. And then it is also correctly displayed in the sample chat app.

How can I achieve the same in my code using the code example referenced above?

Solution

I found the solution myself. So it turns out that you do not need to use the 'default' names for your AI-Search index fields. You can name your index fields whatever you want. However you need to map your field names to the expected default. Here is a working example:

def ask_llm_citation(USER_INPUT:str, AZURE_OPENAI_SYSTEM_MESSAGE: str, NR_DOCUMENTS: int, STRICTNESS: int):
    def parse_multi_columns(columns: str) -> list:
        if "|" in columns:
            return columns.split("|")
        else:
            return columns.split(",")

    endpoint = config["OPENAI_API_BASE"]
    api_key = config["OPENAI_API_KEY"]
    # set the deployment name for the model we want to use
    deployment = config["OPENAI_API_GPT_DEPLOYMENT_NAME"]

    client = openai.AzureOpenAI(
        base_url=f"{endpoint}/openai/deployments/{deployment}/extensions",
        api_key=api_key,
        api_version="2023-09-01-preview"
    )

    response = client.chat.completions.create(
        messages=[{"role": "user", "content": USER_INPUT}],
        model=deployment,
        extra_body={
            "dataSources": [
                {
                    "type": "AzureCognitiveSearch",
                    "parameters": {
                        "endpoint": ai_search["AZURE_COGNITIVE_SEARCH_ENDPOINT"],
                        "key": ai_search["AZURE_COGNITIVE_SEARCH_KEY"],
                        "indexName": ai_search["AZURE_COGNITIVE_SEARCH_INDEX_NAME"],
                        "fieldsMapping": {
                            "contentFields": parse_multi_columns("content"),
                            "urlField": "url_name",
                            "filepathField": "file_name",
                            "vectorFields": parse_multi_columns("content_vector")
                        },
                        "embeddingDeploymentName": config["OPENAI_API_DEPLOYMENT_NAME"],
                        "query_type":"vectorSimpleHybrid",
                        "inScope": True,
                        "roleInformation": AZURE_OPENAI_SYSTEM_MESSAGE,
                        "topNDocuments": NR_DOCUMENTS,
                        "strictness":  STRICTNESS
                    }
                }
            ]
        },
        stream=True,
    )
    for chunk in response:
        delta = chunk.choices[0].delta
        yield delta

Note: ContentFields and VectorFields need to be a list and not a string as multiple fields are possible here. That is why we need to convert it to a list.