Search code examples
pythontrieaho-corasick

Efficient trie storage for a web app


I have an Aho Corasick trie that I parse a body of text through. Now this trie exists as a part of my flask app. It's deployed on Heroku and currently I naively store a pickled form of the automaton, unpickle it whenever needed and use it. What would be a better way to efficiently store the Aho Corasick automaton for a web app such as this?


Solution

  • Accessing a trie on disk is not entirely trivial, so loading it in memory is a good approach.

    Try the pyahocorasick library FWIW. http://pyahocorasick.readthedocs.io/ is your friend. It pickle the automaton alright and use a compact memory scheme to limit memory usage