Search code examples
elasticsearchelasticsearch-queryelasticsearch-analyzers

Partial Search in elasticsearch is working for one and not for other record


elastic search is created with following body

body = {
        "mappings": {
            "properties": {
                "TokenizedDocumentFileName": {
                    "type": "text",
                    "analyzer": "my_analyzer",
                    "search_analyzer": "standard"
                }
            }
        },
        "settings": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "tokenizer": "keyword",
                        "filter": ["word_delimiter",
                                   "lowercase"
                                   ]
                    }
                }
            },
            "number_of_shards": "1",
            "number_of_replicas": "0"
        }
    }

Now there below is 2 different metadata in elasticsearch

{'_index': 'fileboundunitmanuals',
 '_type': '_doc',
 '_id': '997439.PDF',
 '_version': 2,
 '_seq_no': 166958, 
 '_primary_term': 1,
 'found': True, '_source': {
 'IndexKey': '997439.PDF',
 'DocumentID': 997439,
 'Extension': 'PDF',
 'FileID': 174508,
 'DocumentFileName': '\\UNIT xxxxx\\411xxx\\A9.xxxx_xxxxx GAS ENGINE xxxxx_x_997439.PDF',
 'TokenizedDocumentFileName': '\\UNIT xxxxx\\411xxx\\A9. xxxx xxxxx GAS ENGINE xxxxx x 997439.PDF',
 'F1': 'UNIT xxxxx', 
 'ProjectID': 8}}  

2nd record

 {'_index': 'fileboundunitmanuals',
  '_type': '_doc',
  '_id': '3929829.pdf',
  '_version': 1,
  '_seq_no': 538517,
  '_primary_term': 3, 
  'found': True, '_source': {
  'Extension': 'pdf', 
  'DocumentID': 3929829,
  'IndexKey': '3929829.pdf',
  'FileID': '',
  'DocumentFileName': '\\Unit xxxxx\\Mary Testing\\marynewfiletest.pdf', 
  'TokenizedDocumentFileName': '\\Unit xxxxx\\Mary Testing\\marynewfiletest.pdf',
  'F1': 'Unit xxxxx',
  'ProjectID': 8}}

now when searching in elasticsearch using following query for 1st record

  {
  "query":{
  "bool":{
     "must":{
        "match":{
           "TokenizedDocumentFileName":{
              "query":"997439"
           }
        }
     },
     "filter":{
        "bool":{
           "must":[
              {
                 "term":{
                    "ProjectID":8
                 }
              },
              {
                 "term":{
                    "Extension":"pdf"
                 }
              }
           ]
        }
     }
  }
  }
  }

query to search for 2nd record

  {
  "query":{
  "bool":{
     "must":{
        "match":{
           "TokenizedDocumentFileName":{
              "query":"marynewfiletest"
           }
        }
     },
     "filter":{
        "bool":{
           "must":[
              {
                 "term":{
                    "ProjectID":8
                 }
              },
              {
                 "term":{
                    "Extension":"pdf"
                 }
              }
           ]
        }
     }
  }
  }
  }

first query is giving me the right result , since query "997439" is present in TokenizedDocumentFileName , but when I am searching marytesting for 2 records I am getting following respone.

{'took': 0, 'timed_out': False, '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 0, 'relation': 'eq'}, 'max_score': None, 'hits': []}}

But when I am giving filename along with extension i.e "marytesting.pdf", in this case I am getting the right result.

OUTPUT of GET fileunitmanuals

{
"fileboundunitmanuals" : {
"aliases" : { },
"mappings" : {
  "properties" : {
    "DocumentFileName" : {
      "type" : "text",
      "analyzer" : "my_analyzer",
      "search_analyzer" : "standard"
    },
    "DocumentID" : {
      "type" : "long"
    },
    "Extension" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    },
    "F1" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    },
    "FileID" : {
      "type" : "long"
    },
    "IndexKey" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    },
    "ProjectID" : {
      "type" : "long"
    },
    "TokenizedDocumentFileName" : {
      "type" : "text",
      "fields" : {
        "keyword" : {
          "type" : "keyword",
          "ignore_above" : 256
        }
      }
    }
  }
},
"settings" : {
  "index" : {
    "number_of_shards" : "1",
    "provided_name" : "fileboundunitmanuals",
    "creation_date" : "1607069298331",
    "analysis" : {
      "analyzer" : {
        "my_analyzer" : {
          "filter" : [
            "word_delimiter",
            "lowercase"
          ],
          "tokenizer" : "keyword"
        }
      }
    },
    "number_of_replicas" : "0",
    "uuid" : "u8HasYfVT6iMr7XGpdjJHg",
    "version" : {
      "created" : "7090199"
    }
  }
}
}
}

So the question is why partialsearch is working for the 1st one and not for the second one.


Solution

  • According to your mapping, the field TokenizedDocumentFileName is just text and keyword, so it doesn't have your analyzers. So it's just a coincidence that your first query works.

    You should make sure to properly create your index with the right mapping before indexing your first document.

    PS: I was able to create your index with the settings/mappings you gave and I got the expected result for the both queries, so you're almost there.