python python-2.7 full-text-search whoosh

Whoosh - Slop Operator Behaviour

# Text: income tax expense resulting from the utilization of net operating loss carry forwards

Query Formats tried:

q = QueryParser(u"content", ix.schema).parse(u"income utilization~3")
q = QueryParser(u"content", ix.schema).parse(u"'income utilization'~3")

The slop operator seems to be not working for my use case. It doesn't consider the slop value given in the above formats. It always return the result even though slop condition doesn't met. Can you please help?

Output:

 (content:income AND content:utilization)
 <Hit {'title': u'test'}>

Full Snippet:

import os

from whoosh.fields import Schema, ID, TEXT
from whoosh.index import create_in, open_dir
from whoosh.qparser import QueryParser


schema = Schema(title=ID(stored=True), content=TEXT)

def setup():
    if not os.path.exists("indexdir"):
        os.makedirs("indexdir")

    ix = create_in("indexdir", schema)
    writer = ix.writer()
    writer.add_document(title=u"test", content=u"income tax expense resulting from the utilization of net operating loss carry forwards")
    writer.commit()

def fetch():
    ix = open_dir("indexdir")
    with ix.searcher() as searcher:
        q = QueryParser(u"content", ix.schema).parse(u"income utilization~3")
        print q
        results = searcher.search(q)
        for r in results:
            print r

if __name__ == '__main__':
    setup()
    fetch()

Solution

You are confusing fuzzy operator with the slop operator:

Fuzzy Operator/edit distance: word~ and word~n , those are for fuzzy terms means searching word with edit distance equal to n.
Slop Operator: "word1 word2 ... wordk"~n, this is for phrase search with slop equal to n.

You should try:

# "income utilization"~3
q = QueryParser(u"content", ix.schema).parse(u'"income utilization"~3')

references: