Search code examples
azureopenai-apilangchainazure-ai-search

Azure AI search fields mapping JSON and retrievable fields


I'm currently implementing RAG on Azure using OpenAI and Azure AI Search, formerly known as Cognitive Services. I have around 50-65 JSON files that I need to search on my enterprise data. It turns out that in the referencing of the chatbot, I'm only getting the text "citation" and I'm trying to retrieve the DOI, which is the URL to the document online, and the title of the scientific article. This files are saved as .txt.

I have formatted my JSON file in this manner where the keys 'content' and 'title' are the only ones I want to perform a semantic search on and also make retrievable, while I just want the DOI (URL) to be retrievable.

{
  "content": "The human eye is a complex organ responsible for vision, capturing light and converting it into neural signals for the brain to interpret. It consists of multiple parts, including the cornea, lens, and retina, each playing a vital role in the process of seeing.",
  "date": "2023-07-15",
  "Title": "The Magic of Vision",
  "editorial_house": "MIT Research Meds and Public Health",
  "doi": "https://doi.org/10.1234",
  "author": "Dr. John Mayer"
}

Nonetheless when I'm on the Azure AI search page I never get my other fields to be selected in metadata:

enter image description here

As you can see, only 'content' appears and I still get this unappealing citation in the foot references of my searches. How can I make my data retrievable in the way I want?

As I'm not using code to do this, only the Azure Studio web, I'm not sure if the only way to do that is by using code.

My desired output is something like this:

enter image description here

Is this possible? Is it possible using the Azure studio or just doing code?

Update

I'm setting up the custom mappings like this:

enter image description here

Nonetheless while I'm getting the correct title and content of the citations panel I'M missing the DOI which is the URL of the publication. Is there something I'm doing wrong??

enter image description here


Solution

  • Import data and with index definition both way you can do.

    In portal, after you clicking on import data you will get an option connect to your data there you need to configure parsing mode as json.

    enter image description here

    then you will get the correct fields. enter image description here

    Here you can remove whichever field you don't want.

    Another method is create index with custom definition like below.

    [
    {
          "name": "content",
          "type": "Edm.String",
          "searchable": true,
          "filterable": false,
          "retrievable": true,
          "stored": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "standard.lucene",
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        },
        {
          "name": "date",
          "type": "Edm.DateTimeOffset",
          "searchable": false,
          "filterable": false,
          "retrievable": true,
          "stored": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        },
        {
          "name": "Title",
          "type": "Edm.String",
          "searchable": true,
          "filterable": false,
          "retrievable": true,
          "stored": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "standard.lucene",
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        },
        {
          "name": "editorial_house",
          "type": "Edm.String",
          "searchable": true,
          "filterable": false,
          "retrievable": true,
          "stored": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "standard.lucene",
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        },
        {
          "name": "doi",
          "type": "Edm.String",
          "searchable": true,
          "filterable": false,
          "retrievable": true,
          "stored": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "standard.lucene",
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        },
        {
          "name": "author",
          "type": "Edm.String",
          "searchable": true,
          "filterable": false,
          "retrievable": true,
          "stored": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "standard.lucene",
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        },
        {
          "name": "metadata_storage_size",
          "type": "Edm.Int64",
          "searchable": false,
          "filterable": false,
          "retrievable": true,
          "stored": true,
          "sortable": false,
          "facetable": false,
          "key": false,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": null,
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        },
        {
          "name": "metadata_storage_path",
          "type": "Edm.String",
          "searchable": true,
          "filterable": false,
          "retrievable": true,
          "stored": true,
          "sortable": false,
          "facetable": false,
          "key": true,
          "indexAnalyzer": null,
          "searchAnalyzer": null,
          "analyzer": "standard.lucene",
          "normalizer": null,
          "dimensions": null,
          "vectorSearchProfile": null,
          "vectorEncoding": null,
          "synonymMaps": []
        }
      ]
    

    next configure indexer like below.

    enter image description here

    After saving reset and run the indexer.