Search code examples
azureazure-blob-storageazure-cognitive-searchblobstoreazure-search-.net-sdk

How to add field mapping in Azure AI Search Indexer for nested Json Array


I want to use Azure AI Search to do a full-text search on JSON documents stored in Azure blob storage. Everything is working fine except field mappings for nested JSON arrays. Below is the structure of the JSON document I'm using

{
  "conversation": {
    "datetime": "2023-11-27T09:45:00",
    "userDetails": {
      "userId": "98765",
      "username": "ProjectPro"
    },
    "messages": [
      {
        "sender": "user",
        "message": "Good morning! I have a question about the upcoming project."
      },
      {
        "sender": "assistant",
        "message": "Good morning! I'm here to help. What do you need assistance with regarding the project?"
      }
      //Other messages...
    ]
  }
}

I have configured the search index as

{
  "name": "conversation-index",
  "fields": [
    {"name": "datetime", "type": "Edm.String", "searchable": false, "filterable": true, "sortable": true, "facetable": false},
    {"name": "userId", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": false},
    {"name": "username", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": true, "facetable": false},
    {"name": "message", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false},
    {"name": "sender", "type": "Edm.String", "searchable": true, "filterable": true, "sortable": false, "facetable": false},
  ]
}

And configured the search indexer as below

{
  "name": "conversation-indexer",
  "dataSourceName": "conversation-datasource",
  "targetIndexName": "conversation-index",
  "schedule": { "interval": "PT1H" },
  "parameters": { "configuration": { "dataToExtract": "contentAndMetadata",  "parsingMode": "json" } },
  "fieldMappings": [
    {"sourceFieldName": "/conversation/datetime", "targetFieldName": "datetime"},
    {"sourceFieldName": "/conversation/userDetails/userId", "targetFieldName": "userId"},
    {"sourceFieldName": "/conversation/userDetails/username", "targetFieldName": "username"},
    {"sourceFieldName": "/conversation/messages[].message", "targetFieldName": "message"},
    {"sourceFieldName": "/conversation/messages[].sender", "targetFieldName": "sender"}
  ]
}

Indexer field mapping for message and sender is not working. Search is returning null for both of these fields. What is the right way to do indexing on nested JSON arrays?


Solution

  • The correct way to select all values from an array is by using a *. So in your case, your sourceFieldName should be “/conversation/messages/*/sender”. The only thing is that because messages is an array, the output of the above mapping will also be an array, just an array of strings since you are only selecting the “sender” property from the objects within the array. Since your index definition has the “sender” field as an Edm.String, the mapping still wouldn’t work, it would need to be Collection(Edm.String) instead. If you want each object in the “messages” array to result in its own object in the index (so that the “sender” field will be a single Edm.String as you have it defined currently) I would recommend you check out our new preview Index Projections feature which would let you accomplish that.