I have over 700 small, locally stored text files (no more that 2Kb large). Each file is located in a folder such as:
2012//05//18.txt
2013//09/22.txt
2014//11//29.txt
(each file represents a day).
The content contains daily "incident" information. I'm interested in being able to locate files containing specific words.
Preface: I've never used whoosh. I just read the quickstart.
Seems like this would be pretty trivial to accomplish. The crux of the work you'll have to do is here:
writer.add_document(title=u"First document", path=u"/a", content=u"This is the first document we've added!")
This is where the index is actually created. Essentially, what it seems you will have to do is use something like os.walk to crawl your root directory, and fill in the data, and for each file it finds, fill in the writer.add_document
params.
for root, dirs, files in os.walk("/your/path/to/logs"):
for file in files:
filepath = os.path.join(root, file)
with open(filepath, 'r') as content_file:
file_contents = content_file.read()
writer.add_document(title=file, content=the_contents, path=filepath)
That's a starting point at least. It seems pretty straightforward to use the Searcher provided by Whoosh to run through your indexed files after that.
with ix.searcher() as searcher:
query = QueryParser("content", ix.schema).parse("somesearchstring")
results = searcher.search(query)
print(results)
Like I said I've never used whoosh but this is what I would do were I you.