Search code examples
elasticsearchkibanaelasticsearch-dsl

Why elasticsearch returns too different results for the same query on different indices?


I first use Elasticsearch & Kibana so please keep calm with my question!

I was given an ES that already had an index called dai-* having some data pre-ingested. To safely play around with the ES I created a new index called ad-prior. I then continued feeding the both indices with data looking like:

{'obj_id': 'UHDRXEWEEK', 'event_type': 'PREC_AD_STARTED', 'event_id': '5c6b584373d', 'timestamp': 1550540223736L, 'channel_id': '123456789'}
{'obj_id': 'FDREJJSSHE', 'event_type': 'PREC_AD_STARTED', 'event_id': '4f53jhabd24', 'timestamp': 1550540225872L, 'channel_id': '123456789'}

I then tried doing searches on Kibana's Discover:

event_type.keyword:PREC_AD_STARTED
event_type:PREC_AD_STARTED
event_type:'PREC_AD_STARTED'

Index dai-*: the above searches all returned 367 hits.

Index ad-prior: the above searches returned different results: event_type:PREC_AD_STARTED returning 8 hits but the two others returning 0 hits.

Why did the above searches return the same result for dai-* while return different results for ad-prior?

Update

To answer @Nishant Saini's comment I have grabbed what-I-suppose-to-be mapping for event_type here:

For dai-*:

"event_type": {
  "type": "text",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
}

For ad-prior:

"event_type": {
  "type": "keyword",
  "ignore_above": 1024
}

Solution

  • Case 1: event_type.keyword:PREC_AD_STARTED

    In index dia-* the property event_type has a sub-field named keyword. The above query refer to this sub-field i.e. event_type.keyword. For the match in dai-* the documents are returned, whereas this sub-field is not present for field event_type in index ad-prior and hence no results.

    Case 2: event_type:PREC_AD_STARTED

    event_type is present in both indexes. Even though in index dai-* the data type is text and since by default standard analyzer is applied theredore PREC_AD_STARTED will result in prec_ad_started. The above query applies same analyzer to input string which also then translates to prec_ad_started and hence matches to documents.

    In case of index ad-prior the data type of event_type is keyword and hence input string is indexed as is. Even while searching the same happens and hence the query above matches in this case as well.

    Therefore this query gives you result in case of both indexes.

    Case 3: event_type:'PREC_AD_STARTED'

    For index dai-* as query is on field event_type (not on event_type.keyword) which is of type text (default analyzer: standard) when PREC_AD_STARTED is indexed the value that get indexed is prec_ad_started due to standard analyzer. The query above is searching for 'PREC_AD_STARTED' (with single quotes). Even this string will be passed through standard analyzer which also translates to prec_ad_started and hence this query matches.

    In case of ad-prior index, event_type is of type keyword, which means index as is without any modification. Since we are querying on field event_type the query will apply no analyzer (since data type is keyword) and hence will search for 'PREC_AD_STARTED' (not PREC_AD_STARTED) and hence no matches.