Search code examples
pythondjangodatabasepostgresqldatabase-table

approximate count of rows in postgresql django


I have a postgresql database connected to with django. In database there is lots of big tables which I want their row count. Because of large size of tables this takes a lot of time to execute.

I founded that the approximate count could be retrieved from pg_class. Is there any way to this in Django and not executing raw query?


Solution

  • I propose to use a dedicated package for this named django-postgres-fuzzycount [GitHub]. This package provides a manager that will do the fast counting.

    You can install the package with pip (for example in your local environment):

    $ pip install django-postgres-fuzzycount
    

    Then you can add the FuzzyCountManager to the models where you want to obtain an approximative count:

    from django.db import models
    from fuzzycount import FuzzyCountManager
    
    class SomeModel(models.Model):
    
        #  ... (some fields) ...
    
        objects = models.Manager()
        approx = FuzzyCountManager()

    and then you can count approximatively with:

    SomeModel.approx.count()

    In case you .filter(..), Django will calculate the real number, since the pg_class table only stores an apprixmative number of rows for the entire table, so:

    SomeModel.approx.filter(foo=bar).count()

    will take more time (depending on indexes, etc.).

    You can also "patch" the objects manager directly, but then obtaining the real number of records might be harder:

    from django.db import models
    from fuzzycount import FuzzyCountManager
    
    class SomeModel(models.Model):
    
        #  ... (some fields) ...
    
        objects = FuzzyCountManager()

    It is also nice that if you change the backend database to another database, the FuzzyCountManager(..) will act as a normal Manager, so in case you later change the database system, you do not have to rewrite the managers.