Search code examples
python-3.xelasticsearchmd5elasticsearch-dsl

How to search on md5 fields in elasticsearc_dsl


I'm new one Elasticsearch and elasticsearch_dsl and I've a problem, I want to make a research on md5 fields but I don't knows if I do it right.

Here is the file I've stored :

"data": {
        "uniqueInfo": {
                      "md5_of_my_unique_info": "a3e2c73ab0aaze73881db1a889826ada",
                       }

md5_of_my_unique_info is a hash of lot of value and I want to make a research to know if it exist in the database so I do this :

es_host = {"host": "localhost", "port": 9200}
es = Elasticsearch(hosts=[es_host])
q = Q('bool',
      must[Q('match', data__uniqueInfo__md5_of_my_unique_info=md5_value_I_want_Input)],
      )
s = Search().using(es).query(q)
response = s.execute
for hit in s:
    print(hit.meta.id)

I've test it on a bunch of data (15) and it seems to work but I can't test it on more data in test so can someone tell if I do it right ? if not How sould I do it ?

Thank you in advance to any one who could help me


Solution

  • I agree with JotaGe in the comment - this is fine if md5_of_my_unique_info is of type keyword (see [0] on how to set mappings in the dsl). Note that if you haven't done anything to the mappings you should have a keyword subfield automatically created for you by elasticsearch.

    Using term query as a filter will also get you slightly better performance as elasticsearch won't have to try and calculate the score which shouldn't matter in your case, and you don't have to wrap your query in a bool query.

    Overall your code would look like:

    es_host = {"host": "localhost", "port": 9200}
    es = Elasticsearch(hosts=[es_host])
    s = Search().using(es)
    s = s.filter('term', data__uniqueInfo__md5_of_my_unique_info__keyword=md5_value_I_want_Input)
    response = s.execute
    for hit in s:
        print(hit.meta.id)
    

    Hope this helps!

    0 - http://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html#doctype