Search code examples
elasticsearchsearch-enginedjango-haystack

Elasticsearch: do exact searches where the query contains special characters like '#'


Get the results of only those documents which contain '#test' and ignore the documents that contain just 'test' in elasticsearch


Solution

  • People may gripe at you about this question, so I'll note that it was in response to my comment on this post.

    You're probably going to want to read up on analysis in Elasticsearch, as well as match queries versus term queries.

    Anyway, the convention here is to use a .raw sub-field on a string field. That way, if you want to do searches involving analysis, you can use the base field, but if you want to search for exact (un-analyzed) values, you can use the sub-field.

    So here is a simple mapping that accomplishes this:

    PUT /test_index
    {
       "mappings": {
          "doc": {
             "properties": {
                "post_text": {
                   "type": "string",
                   "fields": {
                      "raw": {
                         "type": "string",
                         "index": "not_analyzed"
                      }
                   }
                }
             }
          }
       }
    }
    

    Now if I add these two documents:

    PUT /test_index/doc/1
    {
        "post_text": "#test"
    }
    
    PUT /test_index/doc/2
    {
        "post_text": "test"
    }
    

    A "match" query against the base field will return both:

    POST /test_index/_search
    {
        "query": {
            "match": {
               "post_text": "#test"
            }
        }
    }
    ...
    {
       "took": 2,
       "timed_out": false,
       "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
       },
       "hits": {
          "total": 2,
          "max_score": 0.5945348,
          "hits": [
             {
                "_index": "test_index",
                "_type": "doc",
                "_id": "1",
                "_score": 0.5945348,
                "_source": {
                   "post_text": "#test"
                }
             },
             {
                "_index": "test_index",
                "_type": "doc",
                "_id": "2",
                "_score": 0.5945348,
                "_source": {
                   "post_text": "test"
                }
             }
          ]
       }
    }
    

    But the "term" query below will only return the one:

    POST /test_index/_search
    {
        "query": {
            "term": {
               "post_text.raw": "#test"
            }
        }
    }
    ...
    {
       "took": 2,
       "timed_out": false,
       "_shards": {
          "total": 1,
          "successful": 1,
          "failed": 0
       },
       "hits": {
          "total": 1,
          "max_score": 1,
          "hits": [
             {
                "_index": "test_index",
                "_type": "doc",
                "_id": "1",
                "_score": 1,
                "_source": {
                   "post_text": "#test"
                }
             }
          ]
       }
    }
    

    Here is the code I used to test it:

    http://sense.qbox.io/gist/2f0fbb38e2b7608019b5b21ebe05557982212ac7