Search code examples
djangodjango-rest-frameworkdjango-2.2django-jsonfield

Optimising API queries using JSONField()


Initial opening: I am utilising postgresql JSONFields.

I have the following attribute (field) in my User model:

class User(AbstractUser):
   ...
   benefits = JSONField(default=dict())
   ...

I essentially currently serialize benefits for each User on the front end with DRF:

    benefits = UserBenefit.objects.filter(user=self)
    serializer = UserBenefitSerializer(benefits, many=True)

As the underlying returned benefits changes little and slowly, I thought about "caching" the JSON in the database every time there is a change to improve the performance of the UserBenefit.objects.filter(user=user) QuerySet. Instead, becoming user.benefits and hopefully lightening DB load over 100K+ users.


1st Q:

Should I do this?

2nd Q:

Is there an efficient way to write the corresponding serializer.data <class 'rest_framework.utils.serializer_helpers.ReturnList'> to the JSON field?

I am currently using:

data = serializers.serialize("json", UserBenefit.objects.filter(user=self))


Solution

  • For your first question:

    It's not a bad idea if you don't want to use caching alternatives.

    If you have to query the database because of some changes or ... and you can't cache the hole request, then the idea of saving a JSON object can be a pretty good idea. This way you only retrieve the data and skip most parts of serializing and also terminate the need to query a pivot table to get the m2m data. But also note that this way, you are adding a whole bunch of extra data to your rows and unless you're going to need them most of the time, and you will get extra data that you don't really need which you can help it using values function on querysets but still it requires more coding. Basically, you're going to use more bandwidth for your first query and more storage to store the data instead of process power. Also, the pagination will be really hard to achieve on your benefits if you need it at some point.

    Getting m2m relation data is usually pretty fast depending on the amount of data you have on your database but the ultimate way of getting better performance is caching the requests and reducing the database hits as much as possible.

    And as you probably hear it a lot, you should test and benchmark to see which options really works for you the best depending on your requirements and limitations. It's really hard to suggest an optimization method without knowing the information about the whole scope and the current solution.

    And for your second question:

    I think I don't really get it. If you are storing a JSON object which is a field in User model, then why do you need data = serializers.serialize("json", UserBenefit.objects.filter(user=self)) ?

    You don't need it since the serializer can just return the JSON field data.