Search code examples
elasticsearchnest

Empty String elastic search


I'm using Elastic 6.5 .

I need to include an empty string search with one of the criteria i'm passing.

primaryKey = 1, 2, 3

subKey = "" or subKey = "A" along with a bunch of other criteria.

I've been unable to get the record that has the empty subKey.

i've tried using the MUST_NOT EXISTS but it doesn't fetch the record in question.

So below should return any records that have primarykey of 1, 2, or 3. and subKey of 'A' or Empty String. Filtered by the Date provided. I get all the records Except the record where the subKey is blank.

so i've tried this:

{
  "size": 200, "from": 0,
  "query": {
    "bool": {
      "must": [{
                "bool": {
                  "should": [{ "terms": {"primaryKey": [1,2,3] }}]
                }
              },
              {
                "bool": {
                  "should": [ 
                              {"match": {"subKey": "A"}}, 
                              {
                                "bool" : {
                                  "must_not": [{ "exists": { "field": "subKey"} }]
                                }
                              }
                            ]
                }
              }],
      "filter": [{"range": {"startdate": {"lte": "2018-11-01"}}}]
    }
  }
}

The subkey field is special.. where it's actually searched by LETTER. But i don't think that effects anything.. but here is the NEST coding i have for that index.

new CreateIndexDescriptor("SpecialIndex").Settings(s => s
                .Analysis(a => a
                        .Analyzers(aa => aa
                            .Custom("subKey_analyzer", ma => ma
                                .Tokenizer("subKey_tokenizer")
                                .Filters("lowercase")
                            )
                        )
                        .Tokenizers(ta => ta
                            .NGram("subKey_tokenizer", t => t
                                .MinGram(1)
                                .MaxGram(1)
                                .TokenChars(new TokenChar[] { TokenChar.Letter, TokenChar.Whitespace })
                            )
                        )
                    )
                )
                .Mappings(ms => ms
                    .Map<SpecialIndex>(m => m
                        .Properties(p => p
                            .Text(s => s
                                .Name(x => x.subKey)
                                .Analyzer("subKey_analyzer")
                            )
                        )
                    ));

Any ideas on how to resolve this? Thank you very much!

NOTE: i've seen posts saying this can be done with a filter, using missing. But as you can see from the query, i need the Query to do this, not the filter.

i've also tried the following rather than the MUST_NOT EXISTS

{
    "term": { "subKey": { "value": "" }}
}

but doesn't work. I'm thinking I need another tokenizer to get this working.


Solution

  • Ok, I managed to fix this by using Multi-fields. This is what i did.

    Changed the Mappings to this:

                      .Mappings(ms => ms
                        .Map<SpecialIndex>(m => m
                            .Properties(p => p
                                .Text(s => s
                                    .Name(x => x.subKey)
                                    .Fields(ff => ff
                                        .Text(tt => tt
                                            .Name("subKey")
                                            .Analyzer("subKey_analyzer")
                                        )
                                        .Keyword(k => k
                                            .Name("keyword")
                                            .IgnoreAbove(5)
                                        )
                                    )
                                )
                            )
                        ));
    

    then i changed my query BOOL piece to this:

                "bool": {
                    "should": [{
                        "match": {
                            "subKey.subKey": {
                                "query": "A"
                            }
                        }
                    },
                    {
                        "term": {
                            "subKey.keyword": {
                                "value": ""
                            }
                        }
                    }]
                }
    

    what i don't really like about this is that i think Elastic is creating an additional field just to find EMPTY strings of the same field. That really doesn't seem ideal.

    Anyone have another suggestion that would be great!

    [UPDATE] The NEST implementation needs to use SUFFIX to access the multi-fields.

    .Bool(bb => bb
       .Should(bbs => bbs
          .Match(m => m.Field(f => f.subKey.Suffix("subKey")).Query(search.subKey)),
          bbs => bbs
          .Term(t => t.Verbatim().Field(f => f.subKey.Suffix("keyword")).Value(string.Empty)))