Search code examples
query-optimizationdjango-ormdatabase-optimization

Questions regarding a optimizing a slow query (SQL included)


SELECT DISTINCT "myapp_profile"."user_id", "myapp_profile"."name", 
  "myapp_profile"."age", "auth_user"."id", "auth_user"."username", 
  "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", 
  "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", 
  "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" 
FROM "myapp_profile" 
INNER JOIN "auth_user" ON ("myapp_profile"."user_id" = "auth_user"."id") 
LEFT OUTER JOIN "myapp_siterel" ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
LEFT OUTER JOIN "django_site" ON ("myapp_siterel"."site_id" = "django_site"."id") 
WHERE ("auth_user"."is_superuser" = false 
AND "auth_user"."is_staff" = false 
AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))) 
ORDER BY "myapp_profile"."user_id" 
DESC LIMIT 100

The above query takes about 100 seconds to run with 2 million users/profiles. I'm no DBA and our DBAs are looking at the situation to see what can be done, but since I'll likely never get to see what changes (assuming it happens at the DB level), I'm curious how you could optimized this query. It obviously needs to happen a ton faster than it is happening, like on the order of 5 seconds or less. If there is no way to optimize the SQL, is there an index or indexes you could add/change to make the query it quicker, or is there anything something else I'm overlooking?

Postgres 9 is the DB, and Django's ORM is where this query came from.

Query Plan

Limit (cost=1374.35..1383.10 rows=100 width=106)
-> Unique (cost=1374.35..1391.24 rows=193 width=106)
-> Sort (cost=1374.35..1374.83 rows=193 width=106)
Sort Key: myapp_profile.user_id, myapp_profile.name, myapp_profile.age, auth_user.username, auth_user.first_name, auth_user.last_name, auth_user.email, auth_user.password, auth_user.is_staff, auth_user.is_active, auth_user.is_superuser, auth_user.last_login, auth_user.date_joined
-> Nested Loop (cost=453.99..1367.02 rows=193 width=106)
-> Hash Left Join (cost=453.99..1302.53 rows=193 width=49)
Hash Cond: (myapp_siterel.site_id = django_site.id)
Filter: ((django_site.id IS NULL) OR (django_site.id = ANY ('{10080,10053}'::integer[])))
-> Hash Left Join (cost=448.50..1053.27 rows=15001 width=53)
Hash Cond: (myapp_profile.user_id = myapp_siterel.profile_id)
-> Seq Scan on myapp_profile (cost=0.00..286.01 rows=15001 width=49)
-> Hash (cost=261.00..261.00 rows=15000 width=8)
-> Seq Scan on myapp_siterel (cost=0.00..261.00 rows=15000 width=8)
-> Hash (cost=3.55..3.55 rows=155 width=4)
-> Seq Scan on django_site (cost=0.00..3.55 rows=155 width=4)
-> Index Scan using auth_user_pkey on auth_user (cost=0.00..0.32 rows=1 width=57)
Index Cond: (auth_user.id = myapp_profile.user_id)
Filter: ((NOT auth_user.is_superuser) AND (NOT auth_user.is_staff))

Thanks


Solution

  • I'm not so familiar with postgres, so I'm not sure how good it's query optimiser is, but it looks like everything you have in the where clause could instead be join conditions, although I'd hope postgres is clever enough to work that out for itself, however if it's not then it's going to fetch all your 2 million users with related records in the other 3 tables and then filter that using your where.

    The indexes already mentioned should also work for you if they don't already exist. Again i'm more an MSSQL person but does postgres not have any statistics profile or query plan you can see?

    Something along these lines

    SELECT DISTINCT
        "myapp_profile"."user_id",
        "myapp_profile"."name", 
        "myapp_profile"."age",
        "auth_user"."id",
        "auth_user"."username", 
        "auth_user"."first_name",
        "auth_user"."last_name",
        "auth_user"."email", 
        "auth_user"."password",
        "auth_user"."is_staff",
        "auth_user"."is_active", 
        "auth_user"."is_superuser",
        "auth_user"."last_login",
        "auth_user"."date_joined" 
    FROM "myapp_profile" 
        INNER JOIN "auth_user"
            ON ("myapp_profile"."user_id" = "auth_user"."id") 
            AND "auth_user"."is_superuser" = false
            AND "auth_user"."is_staff" = false 
        LEFT OUTER JOIN "myapp_siterel"
            ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
        LEFT OUTER JOIN "django_site"
            ON ("myapp_siterel"."site_id" = "django_site"."id") 
            AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))
    ORDER BY "myapp_profile"."user_id" DESC
    LIMIT 100
    

    Also, do you need the distinct? That'll also slow it down somewhat.