Search code examples
djangogoogle-app-enginedjangoappengine

Space Efficient Django model type to store large amount of text


currently I'm trying to port existing Google App Engine application from webapp2 to django using the djangoappengine.

Are there a equivalent in memory space saving ways to store the data using Django? Because there are limits to the amount stored in GAE for free user.

webapp2 model code

class TagTrend_refine(ndb.Model):
    tag = ndb.StringProperty()
    trendData = ndb.BlobProperty(compressed=True)

I know that TextField can store large amount of text, but can it store using lesser memory? Is using BlobField possible?

An example of data being store for trendData (as many as 24783 characters) is

{"2008": "{\"nodes\": [{\"group\": 0, \"name\": \"ef-code-first\", \"degree\": 6}, {\"group\": 1, \"name\": \"gridview\", \"degree\": 6}, {\"group\": 2, \"name\": \"mvvm\", \"degree\": 6}, {\"group\": 1, \"name\": \"webforms\", \"degree\": 6}, {\"group\": 2, \"name\": \"binding\", \"degree\": 6}, {\"group\": 3, \"name\": \"web-services\", \"degree\": 6}, {\"group\": 2, \"name\": \"datagrid\", \"degree\": 6},...

Solution

  • Django itself doesn't natively have a way to store data compressed, however you could use the zlib module to compress data before saving it to the database.

    Here's a sample implementation of such a field in Django:

    class CompressedTextField(models.TextField):
    
        def __init__(self, compress_level=6, *args, **kwargs):
            self.compress_level = compress_level
            super(CompressedTextField, self).__init__(*args, **kwargs)
    
        def to_python(self, value):
            value = super(CompressedTextField, self).to_python(value)
            return zlib.compress(value.encode(), self.compress_level)
    
        def get_prep_value(self, value):
            value = super(CompressedTextField, self).get_prep_value(value)
            return zlib.decompress(value).decode()
    

    This field has an extra parameter compared to a regular TextField:

    class TagTrend(models.Model):
    
        tag = models.CharField(max_length=1024)
    
        # zlib offers compression levels 0-9
        #    0 is no compression
        #    9 is maximum compression
        trendData = CompressedTextField(compress_level=9)
    

    As an example, storing the string 'a' * 1024 (which is 1024 bytes) when compressed is only 17 bytes.

    Do note that the limitation of using such a field is that the data is stored compressed. This means your database queries will search/filter using the compressed version.