Search code examples
sqlpostgresqlpostgresql-performance

Optimize time of execution of PSQL query


It is first time I came across problem of long time of query execution. Problem is actually pretty big because query is executing in more then 20seconds which highly visible for endpoint user.

I have quite large database of topics (~8k), topic's have it's parameters (which is dictionared - I have 113 different parameters for 8k topics).

I would like to show report about number of repetitions of those topics.

topic table:
----------------+---------+-----------------------------------------------------
 id             | integer | nextval('topic_id_seq'::regclass)
 topicengine_id | integer |
 description    | text    |
 topicparam_id  | integer |
 date           | date    |

topicparam table:
----------------+---------+----------------------------------------------------------
 id             | integer | nextval('topicparam_id_seq'::regclass)
 name           | text    |

and my query:

select distinct tp.id as tpid, tp.name as desc, (select count(*) from topic where topic.topicparam_id = tp.id) as count, t.date
from topicparam tp, topic t where t.topicparam_id =tp.id

Total runtime: 22372.699 ms

fragment of result :

 tpid |                     topicname               | count |    date
------+---------------------------------------------+-------+---------
 3823 | Topic1                                      |     6 | 2014-03-01
 3756 | Topic2                                      |    14 | 2014-03-01
 3803 | Topic3                                      |    28 | 2014-04-01
 3780 | Topic4                                      |  1373 | 2014-02-01

Is there any way to optimize time of execution for this query?


Solution

  • A simply group by should do the same thing (if I understood your query correctly.

    select tp.id as tpid, 
           max(tp.name) as desc, 
           count(*) as count, 
           max(t.date) as date
    from topicparam tp
      join topic t on t.topicparam_id = tp.id
    group by tp.id;
    

    Btw: date is a horrible name for a column. For one reason because it's also a reserved word, but more importantly because it does not document what the column contains. A "start date", an "end date", a "due date", a "recording date", a "publish date", ...?