I have a database table in psql which contains of 10,000,000 rows and 60 columns (features). I define a Django Queryset as follows:
MyQ=MyDataBase.objects.filter(Name='Mike', date=date(2018, 2, 11),
Class='03')
There are only 5 rows that satisfy the above filter. But when I try something like
MyQ.count() #which equals 5
or
MyQ.aggregate(Sum('Score'))['Score__sum'] #which equals 61
each take about 3 minutes to give me the result. Isn't that weird? Aren't query sets supposed to make life easier by focusing only on the rows that we have told them to focus on? counting 5 rows or summing one of the fields of them must not take that long. What am I doing wrong?
I should also say this. The first time that I tried this code on this table, everything was fine and it took maybe 1 second to catch the result but now the 3 minutes is really annoying. And since then I have not changed anything in the database or the code.
Generally if you are filtering your table based on a certain field or number of fields, you should create an index on those fields. It allows the database query planner to take a more optimized path when searching/sorting.
It looks like you're using Postgres from your question, so you can run SELECT * FROM pg_indexes WHERE tablename = 'yourtable';
in psql to see any existing indexes.
Django can create these indexes for you in your model definition. For example, your model MyDatabase
might look something like this:
class MyDatabase(models.Model):
name = models.TextField(index=True)
date = models.DateField(index=True)
class = models.TextField(index=True)
Here's some more reading specific to creating indexes on Django models: gun.io/blog/learn-indexing-dammit