Search code examples
elasticsearchkibanaelastic-stackelasticsearch-6

How to search over all fields and return every document containing that search in elasticsearch?


I have a problem regarding searching in elasticsearch. I have a index with multiple documents with several fields. I want to be able to search over all the fields running a query and want it to return all the documents that contains the value specified in the query. I Found that using simple_query_string worked well for this. However, it does not return consistent results. In my index I have documents with several fields that contain dates. For example:

"revisionDate" : "2008-01-01T00:00:00",
"projectSmirCreationDate" : "2008-07-01T00:00:00",
"changedDate" : "1971-01-01T00:00:00",
"dueDate" : "0001-01-01T00:00:00",

Those are just a few examples, however when I index for example:

GET new_document-20_v2/_search
{
  "size": 1000, 
  "query": {
    "simple_query_string" : {
        "query": "2008"
    }
  }
}

It only returns two documents, this is a problem because I have much more documents than just two that contains the value "2008" in their fields.

I also have problem searching file names. In my index there are fields that contain fileNames like this:

"fileName" : "testPDF.pdf",
"fileName" : "demo.pdf",
"fileName" : "demo.txt",

When i query:

GET new_document-20_v2/_search
{
  "size": 1000, 
  "query": {
    "simple_query_string" : {
        "query": "demo"
    }
  }
}

I get no results But if i query:

GET new_document-20_v2/_search
{
  "size": 1000, 
  "query": {
    "simple_query_string" : {
        "query": "demo.txt"
    }
  }
}

I get the proper result.

Is there any better way to search across all documents and fields than I did? I want it to return all the document matching the query and not just two or zero. Any help would be greatly appreciated.


Solution

  • Elasticsearch uses a standard analyzer if no analyzer is specified. Since no analyzer is specified on "fileName", demo.txt gets tokenized to

    {
      "tokens": [
        {
          "token": "demo.txt",
          "start_offset": 0,
          "end_offset": 8,
          "type": "<ALPHANUM>",
          "position": 0
        }
      ]
    }
    

    Now when you are searching for demo it will not give any result, but searching for demo.txt will give the result.


    You can instead use a wildcard query to search for a document having demo in fileName

    {
      "query": {
        "wildcard": {
          "fileName": {
            "value": "demo*"
          }
        }
      }
    }
    

    Search Result will be

    "hits": [
          {
            "_index": "67303015",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.0,
            "_source": {
              "fileName": "demo.pdf"
            }
          },
          {
            "_index": "67303015",
            "_type": "_doc",
            "_id": "3",
            "_score": 1.0,
            "_source": {
              "fileName": "demo.txt"
            }
          }
        ]
    

    Since revisionDate, projectSmirCreationDate, changedDate, dueDate are all of type date, so you cannot do a partial search on these dates.

    You can use multi-fields, to add one more field (of text type) in the above fields. Modify your index mapping as shown below

    {
      "mappings": {
        "properties": {
          "changedDate": {
            "type": "date",
            "fields": {
              "raw": {
                "type": "text"
              }
            }
          },
          "projectSmirCreationDate": {
            "type": "date",
            "fields": {
              "raw": {
                "type": "text"
              }
            }
          },
          "dueDate": {
            "type": "date",
            "fields": {
              "raw": {
                "type": "text"
              }
            }
          },
          "revisionDate": {
            "type": "date",
            "fields": {
              "raw": {
                "type": "text"
              }
            }
          }
        }
      }
    }
    

    Index Data:

    {
      "revisionDate": "2008-02-01T00:00:00",
      "projectSmirCreationDate": "2008-02-01T00:00:00",
      "changedDate": "1971-01-01T00:00:00",
      "dueDate": "0001-01-01T00:00:00"
    }
    {
      "revisionDate": "2008-01-01T00:00:00",
      "projectSmirCreationDate": "2008-07-01T00:00:00",
      "changedDate": "1971-01-01T00:00:00",
      "dueDate": "0001-01-01T00:00:00"
    }
    

    Search Query:

    {
      "query": {
        "multi_match": {
          "query": "2008"
        }
      }
    }
    

    Search Result:

    "hits": [
          {
            "_index": "67303015",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.0,
            "_source": {
              "revisionDate": "2008-01-01T00:00:00",
              "projectSmirCreationDate": "2008-07-01T00:00:00",
              "changedDate": "1971-01-01T00:00:00",
              "dueDate": "0001-01-01T00:00:00"
            }
          },
          {
            "_index": "67303015",
            "_type": "_doc",
            "_id": "1",
            "_score": 0.18232156,
            "_source": {
              "revisionDate": "2008-02-01T00:00:00",
              "projectSmirCreationDate": "2008-02-01T00:00:00",
              "changedDate": "1971-01-01T00:00:00",
              "dueDate": "0001-01-01T00:00:00"
            }
          }
        ]