Search code examples
elasticsearchprefix

Elasticsearch Prefix query not working on nested documents


I'm using a prefix query for an elasticsearch query. It works fine when using it on top-level data, but once applied to nested data there are no results returned. The data I try to query looks as follows:

Here the prefix query works fine: Query:

{ "query": { "prefix" : { "duration": "7"} } }

Result:

{
   "took": 25, ... },
   "hits": {
      "total": 6,
      "max_score": 1,
      "hits": [
         {
        "_index": "itemresults",
        "_type": "itemresult",
        "_id": "ITEM_RESULT_7c8649c2-6cb0-487e-bb3c-c4bf0ad28a90_8bce0a3f-f951-4a01-94b5-b55dea1a2752_7c965241-ad0a-4a83-a400-0be84daab0a9_61",
        "_score": 1,
        "_source": {
           "score": 1,
           "studentId": "61",
           "timestamp": 1377399320017,
           "groupIdentifiers": {},
           "assessmentItemId": "7c965241-ad0a-4a83-a400-0be84daab0a9",
           "answered": true,
           "duration": "7.078",
           "metadata": {
              "Korrektur": "a",
              "Matrize12_13": "MA.1.B.1.d.1",
              "Kompetenz": "ZuV",
              "Zyklus": "Z2",
              "Schwierigkeit": "H",
              "Handlungsaspekt": "AuE",
              "Fach": "MA",
              "Aufgabentyp": "L"
           },
           "assessmentSessionId": "7c8649c2-6cb0-487e-bb3c-c4bf0ad28a90",
           "assessmentId": "8bce0a3f-f951-4a01-94b5-b55dea1a2752"
        }
     },

Now trying to use the prefix query to apply on the nested structure 'metadata' doesn't return any result:

{ "query": { "prefix" : { "metadata.Fach": "M"} } }

Result:

{
   "took": 18,
   "timed_out": false,
   "_shards": {
      "total": 15,
      "successful": 15,
      "failed": 0
   },
   "hits": {
      "total": 0,
      "max_score": null,
      "hits": []
   }
}

What am I doing wrong? Is it at all possible to apply prefix on nested data?


Solution

  • It does not depends whether is nested or not. It depends on your mapping, if you are analyzing the string at index time or not.

    I'm going to put an example:

    I've created and index with the following mapping:

    curl -XPUT 'http://localhost:9200/test/' -d '
    {
      "mappings": {
    
        "test" : {
          "properties" : {
            "text_1" : {
               "type" : "string",
               "index" : "analyzed"
            },
            "text_2" : {
              "index": "not_analyzed",
               "type" : "string"
            }
          }
        }
      }
    }'
    

    Basically 2 text fields, one analyzed and the other not_analyzed. Now I index the following document:

    curl -XPUT 'http://localhost:9200/test/test/1' -d '
    {
    "text_1" : "Hello world",
    "text_2" : "Hello world"
    }'
    

    text_1 query

    As text_1 is analyzed one of the things that elasticsearch does is to convert the field into lower case. So if I make the following query it doesn't find any document:

    curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
    { "query": { "prefix" : { "text_1": "H"} } }
    '
    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 0,
        "max_score" : null,
        "hits" : [ ]
      }
    }
    

    But if I do the trick and use lower case for making the query:

    curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
    { "query": { "prefix" : { "text_1": "h"} } }
    '
    {
      "took" : 2,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 1,
        "max_score" : 1.0,
        "hits" : [ {
          "_index" : "test",
          "_type" : "test",
          "_id" : "1",
          "_score" : 1.0, "_source" :
    {
    "text_1" : "Hello world",
    "text_2" : "Hello world"
    }
        } ]
      }
    }
    

    text_2 query

    As text_2 is not analyzed, when I make the original query it matches:

    curl -XGET 'http://localhost:9200/test/test/_search?pretty=true' -d '
    { "query": { "prefix" : { "text_2": "H"} } }
    '
    {
      "took" : 1,
      "timed_out" : false,
      "_shards" : {
        "total" : 5,
        "successful" : 5,
        "failed" : 0
      },
      "hits" : {
        "total" : 1,
        "max_score" : 1.0,
        "hits" : [ {
          "_index" : "test",
          "_type" : "test",
          "_id" : "1",
          "_score" : 1.0, "_source" :
    {
    "text_1" : "Hello world",
    "text_2" : "Hello world"
    }
        } ]
      }
    }