Search code examples
elasticsearchfuzzy-searchfuzzy-logicfuzzy

Is fuzzy query in elasticsearch related to fuzzy logic?


As the title states, what exactly in Elasticsearch's fuzzy-query is related to fuzzy logic?

For example, given a string, a fuzzy query with fuzziness of 2 will return all indexed strings that have a Levenshtein distance of 2. How does the system decide what answers to return if there are multiple matches?

Is there a fuzzy system behind it? one that has triangular functions (for instance) and can be expressed in something like this:

1|   A    B
 |   /\  /\      A = fuzzy set 1
 |  /  \/  \     B = fuzzy set 2
 | /   /\   \
0|/   /  \   \
 ------------
  a  b  c d

I would like a more theoretical answer that tackles what exactly in fuzzy queries is so fuzzy?


Solution

  • Fuzzy String Matching in Elasticsearch is just another way of saying "Approximate String Matching". It is not implemented using Fuzzy Logic.

    Lucene (the library underpinning Elasticsearch and Solr), implements "fuzzy" (approximate) search up to an edit distance of 2 using a Finite State Transducer representing the union of all possible transitions (including edits and deletes for edit distance 1 or 2) between characters in every indexed term.

    It's an efficient data structure for storing and tracing the world of all existing terms that meet the criteria entered. Here's a pic from a good article about these.

    enter image description here

    (shows a Levenshtein Automaton representing the word "food" with up to two edits.)