I want to apply a search with Whoosh on a text. Right now this works only for exact matches of tokens (space delimited). I'd like to match also within a token (e.g.: match add in a token "added"). I know about stemming and variations, but this are not what I'm looking for. Thank you for your Help!
from whoosh.index import create_in
from whoosh.fields import Schema, TEXT, KEYWORD, ID, STORED
from whoosh.qparser import QueryParser
schema = Schema(title=TEXT(), content=TEXT())
indexpath = (r"C:\Users\rettenma\.jupyter\JupyterWork"+
r"folder\Python_Repository\bin\index")
ix = create_in(indexpath, schema)
writer = ix.writer()
writer.add_document(title=u"First document",
content=u"This is the first document we've added!")
writer.commit()
with ix.searcher() as searcher:
query = QueryParser("content", ix.schema).parse("add")
results = searcher.search(query, terms=True)
print(results[0])
This will raise an Error because of results being empty.
http://whoosh.readthedocs.io/en/latest/api/query.html#whoosh.query.Regex
Sounds like you need regular expressions.
[EDIT BEGIN]
Hope this helps:
Above is the first example of capturing the words as describe by the OP. However, I noticed that there is a problem in that the Regex example will also capture any words containing "add", including i.e. Addendum, Daddy and so on. Having notices this, I have amended and re-forked the Regex example, the link is here below:
[EDIT FINISH]
That is an example with extra testing to be sure you can catch all variations of the word "add", e.g. "add" / "adds" / "added" / "additional". Essentially, anything containing "add" + the rest of the word.