Search code examples
pythondjangomemcachedsuffix-tree

How to store data in Django cache as a reference rather than value.?


I am using Suffix Tree wrapper for python Programmer. https://hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/

I need the same instance of Suffix tree every time, a views is called in Django. So, I store the Suffix tree instance in django-cache and retrieve it every time when I requires that instance.

Problem 1: When I retrieve it from cache, it always changes memory location. Even when python store data using references.

Problem 2: After 2 retrievals, the python floats a "Segmentation fault (core dumped)"

Ques 1: Why instance of Suffix Tree changes its memory location from cache?

Ques 2: Why it is showing segmentation fault?

Ques 3: Is their any other way to store the persistent instance of Suffix Tree somewhere in django, with same instance?

$ python manage.py shell                        
Python 2.7.5 (default, Mar 22 2016, 00:57:36) 
[GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import SuffixTree
>>> d=SuffixTree.SubstringDict()
>>> d["3132"]=1
>>> d["3"]
[1]
>>> d["4343"]=2
>>> d["3"]                                                                     
[1, 2]
>>> from django.core.cache import cache
>>> cache.set("c",d,1000)                                                      
>>> d
<SuffixTree.SubstringDict.SubstringDict instance at 0x27bd830>
>>> cache.get("c")
<SuffixTree.SubstringDict.SubstringDict instance at 0x27ca908>
>>> cache.get("c")
<SuffixTree.SubstringDict.SubstringDict instance at 0x27ca9e0>
>>> cache.get("c")
Segmentation fault (core dumped)

Solution

  • The point of the problem is that Django does not store cache in process memory, so all objects, that you put in cache are serialized before storage and deserialized when you get them back. Every time you retrieve them, the new object, which is a copy of stored object, is created.

    It is implemented is such way because in production environment you will have much more than one django worker processes (possibly, running on different servers). And all that worker processes need to share the same cache. So you cannot have the same instance on every request, because you requests can be handled with different workers.

    Workaround of this problem will vary depending on the purpose of your app.

    According to you comment you can create a module that will cache an instance between requests:

    from datetime import timedelta, datetime
    
    MAX_LIVE_TIME = timedelta(seconds=3600)
    
    _tree_instance = None
    _tree_timestamp = None
    
    def get_cached_tree():
        global _tree_instance, _tree_timestamp
        if _tree_instance is not None and _tree_timestamp - datetime.now() < MAX_LIVE_TIME:
            return _tree_instance
    
        _tree_instance = 'SuffixTree' # Replace this with SuffixTree creation
        _tree_timestamp = now()
        return _tree_instance
    

    And then call get_cached_tree() in you views to get SuffixTree. You will still have different instances on different workers but it'll work much faster and have no segfaults

    P.S. Segmentation fault is the consequence of a bug in Python interpreter that you use or, which is more likely, a bug of the package you use. You should ensure that you use the last version of the package (https://github.com/JDonner/SuffixTree) and if it doesn't help, you should analyze stacktrace (core dump) and submit a bug to SuffixTree repo.