Search code examples
searchluceneelasticsearchtokenize

Elasticsearch wildcard search on not_analyzed field


I have an index like following settings and mapping;

{
  "settings":{
     "index":{
        "analysis":{
           "analyzer":{
              "analyzer_keyword":{
                 "tokenizer":"keyword",
                 "filter":"lowercase"
              }
           }
        }
     }
  },
  "mappings":{
     "product":{
        "properties":{
           "name":{
              "analyzer":"analyzer_keyword",
              "type":"string",
              "index": "not_analyzed"
           }
        }
     }
  }
}

I am struggling with making an implementation for wildcard search on name field. My example data like this;

[
{"name": "SVF-123"},
{"name": "SVF-234"}
]

When I perform following query;

http://localhost:9200/my_index/product/_search -d '
{
    "query": {
        "filtered" : {
            "query" : {
                "query_string" : {
                    "query": "*SVF-1*"
                }
            }
        }

    }
}'

It returns SVF-123,SVF-234. I think, it still tokenizes data. It must return only SVF-123.

Could you please help on this?

Thanks in advance


Solution

  • My solution adventure

    I have started my case as you can see in my question. Whenever, I have changed a part of my settings, one part started to work, but another part stop working. Let me give my solution history:

    1.) I have indexed my data as default. This means, my data is analyzed as default. This will cause problem on my side. For example;

    When user started to search a keyword like SVF-1, system run this query:

    {
        "query": {
            "filtered" : {
                "query" : {
                    "query_string" : {
                        "analyze_wildcard": true,
                        "query": "*SVF-1*"
                    }
                }
            }
    
        }
    }
    

    and results;

    SVF-123
    SVF-234
    

    This is normal, because name field of my documents are analyzed. This splits query into tokens SVF and 1, and SVF matches my documents, although 1 does not match. I have skipped this way. I have create a mapping for my fields make them not_analyzed

    {
      "mappings":{
         "product":{
            "properties":{
               "name":{
                  "type":"string",
                  "index": "not_analyzed"
               },
               "site":{
                  "type":"string",
                  "index": "not_analyzed"
               } 
            }
         }
      }
    }
    

    but my problem continued.

    2.) I wanted to try another way after lots of research. Decided to use wildcard query. My query is;

    {
        "query": {
            "wildcard" : {
                "name" : {
                    "value" : *SVF-1*"
                 }
              }
          },
                "filter":{
                        "term": {"site":"pro_en_GB"}
                }
        }
    }
    

    This query worked, but one problem here. My fields are not_analyzed anymore, and I am making wildcard query. Case sensitivity is problem here. If I search like svf-1, it returns nothing. Since, user can input lowercase version of query.

    3.) I have changed my document structure to;

    {
      "mappings":{
         "product":{
            "properties":{
               "name":{
                  "type":"string",
                  "index": "not_analyzed"
               },
               "nameLowerCase":{
                  "type":"string",
                  "index": "not_analyzed"
               }
               "site":{
                  "type":"string",
                  "index": "not_analyzed"
               } 
            }
         }
      }
    }
    

    I have adde one more field for name called nameLowerCase. When I am indexing my document, I am setting my document like;

    {
        name: "SVF-123",
        nameLowerCase: "svf-123",
        site: "pro_en_GB"
    }
    

    Here, I am converting query keyword to lowercase and make search operation on new nameLowerCase index. And displaying name field.

    Final version of my query is;

    {
        "query": {
            "wildcard" : {
                "nameLowerCase" : {
                    "value" : "*svf-1*"
                 }
              }
          },
                "filter":{
                        "term": {"site":"pro_en_GB"}
                }
        }
    }
    

    Now it works. There is also one way to solve this problem by using multi_field. My query contains dash(-), and faced some problems.

    Lots of thanks to @Alex Brasetvik for his detailed explanation and effort