Search code examples
pythonelasticsearchsolrlucene

solr distance search with proximity


I am trying to do some filtering of results using proximity search and I am finding it difficult to construct the correct query for this.

So, in my index I have the following entry:

      {
        "aff":"lg electronics",
        "shortuuid":"sddsd3ww",
        "name":"changhee kim",
        "id":"hjgjh-7678ghjhjhj-fdsfdg",
        "_version_":1764697833293742080},

I try a small variation of the name:

import requests

name="changhee kim a"
org="lg electronics"

requests.get('http://localhost:8983/solr/searcher/select', params={
'q': f'"{name}"~10 AND "{org}"~1',
'wt': 'json',
'rows': 1,
'start': 0,
}).json()

and it f'king returns 0 results! why? I would have thought since the query term is two words out including the space it should capture this and show me the result entry above.

EDIt

Following @Eric's answer:

import requests

name="changhee kim a" # a space and a added at end
org="lg electronics"

requests.get('http://localhost:8983/solr/searcher/select', params={
'q': f'name:{name}~10 AND aff:{org}~1',
'wt': 'json',
'rows': 1,
'start': 0,
}).json()

I get no matching results for the above query..

However, when I make edits inbetween the string:

import requests

name="changh kim" #deleted two `e` in changhee
org="lg electronics"

requests.get('http://localhost:8983/solr/searcher/select', params={
'q': f'name:{name}~10 AND aff:{org}~1',
'wt': 'json',
'rows': 1,
'start': 0,
}).json()

..gives me the correct expected result.

Also, adding chars to the end of the query works fine:

import requests

name="changhee kima" #added `a` to `kim`
org="lg electronics"

requests.get('http://localhost:8983/solr/searcher/select', params={
'q': f'name:{name}~10 AND aff:{org}~1',
'wt': 'json',
'rows': 1,
'start': 0,
}).json()

so what does not seem to work is when a word is added at the end:

name="changhee kim a" #added aditional `a` at the end
org="lg electronics"

why so?


Solution

  • A proximity search query is a phrase query (double quoted) that allows term movements to match the specified phrase, ie. it looks for terms that are within the specific distance from one another in the same field.

    What you want to do is a fuzzy search on two different fields (remove the double quotes and specify the relevant fields) :

    q = f'name:{name}~10 AND aff:{org}~1'
    

    Nb. You will need to escape whitespaces, ie. name = 'changhee\ kim\ a, so that solr understands that the edit distance applies to the searched string as a whole, which works only if the involved fields are not tokenized ("string" fieldtype) or use a tokenizer that treats the entire input stream as a single token (Keyword Tokenizer).