Search code examples
pythonindexingwhoosh

Where does Whoosh (Python) physically store the indexed content?


I am beginning to research on content indexing implementation, and was having a look at Whoosh (https://pypi.python.org/pypi/Whoosh/).

I am curious to know where Whoosh stores its content physically - Is it using files?


Solution

  • Whoosh uses a pluggable storage system; if you use the create_in() function then a FileStorage() class is used that stores indexes in files in a directory.

    See the Whoosh quickstart:

    Once you have the schema, you can create an index using the create_in function:

    import os.path
    from whoosh.index import create_in
    
    if not os.path.exists("index"):
        os.mkdir("index")
    ix = create_in("index", schema)
    

    (At a low level, this creates a Storage object to contain the index. A Storage object represents that medium in which the index will be stored. Usually this will be FileStorage, which stores the index as a set of files in a directory.)