I have two long String sequences that are similar:
C50FD711C2C43287351892A4D82F44B055F048C46D2C54197AC1D1E921F11E6699C4057C4B93907518E6DCA51A672D3D3E419160DAE276CB7716D11B94D8C3BB2E4A591329B7AF973D17A7F9336342FFAAFD4D
and
C50FD711C2C43287351892A4D820B5EAC5F048C1E67CAC197AC1D1E921F11C3623C1DCD6493907518E6DCA18CD71016E7FD1160DAE276CB7716D11B94A6B762E4A591329B7AF973D17A7F9336342FFAAFD4D
Its distance is 41. I would like to find those strings that are similar to eachother. I started a query like this:
GET my_index/_type/_search
{
"query": {
"fuzzy" : {
"sequence.keyword": {
"value": "C50FD711C2C43287351892A4D820B5EAC5F048C1E67CAC197AC1D1E921F11C3623C1DCD6493907518E6DCA18CD71016E7FD1160DAE276CB7716D11B94A6B762E4A591329B7AF973D17A7F9336342FFAAFD4D",
"boost": 1.0,
"fuzziness": 50,
"prefix_length": 10,
"max_expansions": 200
}
}
}
}
I tried with sequence.keyword and sequence, the field is of type text and type keyword. However, it did not find the other similar sequence string in my index. Why?
The answer is pretty simple. The maximum edit distance that is allowed is 2 (as can be seen in the source code for the Fuzziness
class
You can try with a simpler value, if you index AAAAAA
and try to search for AAABBB
with fuzziness: 3
, you'll get nothing.