Search code examples
pythondjangodjango-modelspython-asyncio

Djnago coroutine to query data from DB in batches and yield as an iterable


I am using Django 4.2 with python3.8. I have a Mysql8 DB containing a Model as

class A(models.Model):
    id  = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False)
    name = models.CharField(max_length=522)

class B(models.Model):
    name = models.CharField(max_length=255, unique=True)
    a = models.ManyToManyField(A)

i want to create a function which when provided id for A should be able to query all instances of B for A but because B can be very large like millions of rows. want to iterate over it in batches of 1000 without loading it in memory.

Is it possible to do so?

I wanted to use co-routines as i think i can use it to build a iterator like object.

any other alternatives to make sure it is performant enough so i can parallelize it over for 100+ organizations.

i would like a code snippets if possible so i can understand this better

context: This is a management command that needs to be run as cronjob everyday so i need to run fast


Solution

  • You can use Django's __range operator to break up the queryset into chunks:

    def f():
        b = B.objects.first()
    
        a_count = b.a.all().count()
        chunk_size = 1000
    
        for i in range(0, a_count, chunk_size):
            yield b.a.filter(a__pk__range = (i, chunk_size))
    

    This will yield the result of b.a.all() into chunks of 1000 rows.