Search code examples
sqlpostgresqlperformancesql-execution-plan

postgresql query performance enhancement


I am trying to get a row with highest popularity. Ordering by descending popularity is slowing down the query significantly. Is there a better way to optimize this query ?

Postgresql - 9.5

    ```explain analyse SELECT  v.cosmo_id,
            v.resource_id, k.gid, k.popularity,v.cropinfo_id
            FROM rmsg.verifications V INNER JOIN rmip.resourceinfo R ON
            (R.id=V.resource_id AND R.source_id=54) INNER JOIN rmpp.kgidinfo K ON 
            (K.cosmo_id=V.cosmo_id) WHERE V.status=1 AND 
            v.crop_Status=1 AND V.locked_time isnull ORDER BY k.popularity 
            desc, (v.cosmo_id,
            v.resource_id, v.cropinfo_id) LIMIT 1;```


        QUERY PLAN                                                                                

        Limit  (cost=470399.99..470399.99 rows=1 width=31) (actual time=19655.552..19655.553 rows=1 loops=1)
        Sort  (cost=470399.99..470434.80 rows=13923 width=31) (actual time=19655.549..19655.549 rows=1 loops=1)
            Sort Key: k.popularity DESC, (ROW(v.cosmo_id, v.resource_id, v.cropinfo_id))
            Sort Method: top-N heapsort  Memory: 25kB
             ->  Nested Loop  (cost=19053.91..470330.37 rows=13923 width=31) (actual time=58.365..19627.405 rows=23006 loops=1)
                   ->  Hash Join  (cost=19053.48..459008.74 rows=13188 width=16) (actual time=58.275..19268.339 rows=19165 loops=1)
                         Hash Cond: (v.resource_id = r.id)
                         ->  Seq Scan on verifications v  (cost=0.00..409876.92 rows=7985725 width=16) (actual time=0.035..11097.163 rows=9908140 loops=1)
                               Filter: ((locked_time IS NULL) AND (status = 1) AND (crop_status = 1))
                               Rows Removed by Filter: 1126121
                         ->  Hash  (cost=18984.23..18984.23 rows=5540 width=4) (actual time=57.101..57.101 rows=5186 loops=1)
                               Buckets: 8192  Batches: 1  Memory Usage: 247kB
                               ->  Bitmap Heap Scan on resourceinfo r  (cost=175.37..18984.23 rows=5540 width=4) (actual time=2.827..51.318 rows=5186 loops=1)
                                     Recheck Cond: (source_id = 54)
                                     Heap Blocks: exact=5907
                                     ->  Bitmap Index Scan on resourceinfo_source_id_key  (cost=0.00..173.98 rows=5540 width=0) (actual time=1.742..1.742 rows=6483 loops=1)
                                           Index Cond: (source_id = 54)
        Index Scan using kgidinfo_cosmo_id_idx on kgidinfo k  (cost=0.43..0.85 rows=1 width=23) (actual time=0.013..0.014 rows=1 loops=19165)
                         Index Cond: (cosmo_id = v.cosmo_id)
     Planning time: 1.083 ms
     Execution time: 19655.638 ms
    (21 rows)```

Solution

  • This is your query, simplified by removing parentheses:

    SELECT v.cosmo_id, v.resource_id, k.gid, k.popularity, v.cropinfo_id
    FROM rmsg.verifications V INNER JOIN
         rmip.resourceinfo R
         ON R.id = V.resource_id AND R.source_id = 54 INNER JOIN
         rmpp.kgidinfo K
         ON K.cosmo_id = V.cosmo_id
    WHERE V.status = 1 AND v.crop_Status = 1 AND
          V.locked_time is null
    ORDER BY k.popularity desc, v.cosmo_id, v.resource_id, v.cropinfo_id
    LIMIT 1;
    

    For this query, I would think in terms of indexes on verifications(status, crop_status, locked_time, resource_id, cosmo_id, crop_info_id), resourceinfo(id, source_id), and kgidinfo(cosmo_id). I don't see an easy way to remove the ORDER BY.

    In looking at the query, I wonder if you might have a Cartesian product problem between the two tables.