I'd like to see which terms are used for indexing. This is mostly for debugging, in case I need to do some additional preprocessing to the documents before sending them to Whoosh. A list is fine. Is there a variable that gives me this (perhaps in whoosh.index
)?
Use:
whoosh.reading.IndexReader.all_terms()
Yields (fieldname, text) tuples for every term in the index.