Search code examples
sqlpostgresqlaggregate-functionspostgresql-8.4postgresql-performance

PostgreSQL query efficiency


I'm working with PostgreSQL (I'm a rookie in the database world) and I'd like to know your opinion on the efficiency of this kind of queries I found in the code I'm working with. These queries have a lot of JOINs, and one of them (bold font) has many rows by request. This forces us to GROUP BY request.id in order to obtain a row by request and a field (bold font) with all this rows data.

I think this kind of queries has to lose lots of time looking for all these maximums, but I can't figure an alternative way of doing this. Any ideas on its efficiency and how to improve it?

SELECT
  request.id AS id,
  max(request_type.name) AS request_type,
  to_char(max(request.timestamp),'DD/mm/YYYY HH24:mi') AS timestamp,
  to_char(max(request.timestamp),'YYYY-mm-DD') AS timestamp_filtering,
  max(state.name) AS request_state,
  max(users.name || ' ' || COALESCE(users.surname,'')) AS create_user,
  max(request.id_create_user) AS id_create_user,
  max(enterprise.name) AS enterprise,
  max(cause_issue.name) AS cause,
  max(request_movements.id_request_state) AS id_state,
  array_to_string(array_agg(DISTINCT act_code.name || '/' || req_res.act_code), ', ') AS act_code, /* here */
  max(revised.code) AS state_revised, 
  max(request_shipment.warehouse) AS warehouse,
  max(req_res.id_warehouse) AS id_warehouse
FROM
  request
  LEFT JOIN users
    ON users.id=request.id_create_user
  LEFT JOIN enterprise
    ON users.id_enterprise=enterprise.id
  LEFT JOIN request_movements
    ON request_movements.id=request.id_request_movement
  LEFT JOIN request_versions
    ON request_versions.id = request_movements.id_version
  LEFT JOIN state
    ON request_movements.id_request_state=state.id
  INNER JOIN request_type
    ON request.id_request_type=request_type.id
  LEFT JOIN cause_issue
    ON request.id_cause_issue=cause_issue.id
  LEFT JOIN request_reserve req_res
    ON req_res.id_request = request.id /* here */
  LEFT JOIN act_code
    ON req_res.id_act_code=act_code.id
  LEFT JOIN request_shipment
    ON (request_shipment.id_request=request.id)
  LEFT JOIN warehouse_enterprise
    ON (warehouse_enterprise.id = request_shipment.id_warehouse_enterprise)
  LEFT JOIN revised
    ON (revised.id = request_shipment.id_revised)
WHERE
  request.id_request_type = "any_type"  
GROUP BY
  request.id

The EXPLAIN returns this.


Solution

  • You can much simplify this query by aggregating values in request_reserve and act_code before you JOIN to the big join. This avoids the need for aggregate functions on all the other columns and should generally be much faster for a larger number of rows.

    SELECT r.id
          ,rt.name AS request_type
          ,to_char(r.timestamp, 'DD/mm/YYYY HH24:mi') AS timestamp
          ,to_char(r.timestamp, 'YYYY-mm-DD') AS timestamp_filtering
          ,s.name AS request_state
          ,u.name || COALESCE(' ' || u.surname, '') AS create_user
          ,r.id_create_user
          ,e.name AS enterprise
          ,c.name AS cause
          ,rm.id_request_state AS id_state
          ,rr.act_code
          ,rd.code AS state_revised
          ,rs.warehouse
          ,rr.id_warehouse
    FROM      request              r
    LEFT JOIN users                u  ON u.id = r.id_create_user
    LEFT JOIN enterprise           e  ON e.id = u.id_enterprise
    LEFT JOIN request_movements    rm ON rm.id = r.id_request_movement
    LEFT JOIN request_versions     rv ON rv.id = rm.id_version
    LEFT JOIN state                s  ON s.id = rm.id_request_state
         JOIN request_type         rt ON rt.id = r.id_request_type
    LEFT JOIN cause_issue          c  ON c.id = r.id_cause_issue
    LEFT JOIN request_shipment     rs ON rs.id_request = r.id
    LEFT JOIN warehouse_enterprise w  ON w.id = rs.id_warehouse_enterprise
    LEFT JOIN revised              rd ON rd.id = rs.id_revised
    LEFT JOIN (
       SELECT rr.id_request, rr.id_warehouse
             ,array_to_string(array_agg(
                 DISTINCT a.name || '/' || rr.act_code), ', ') AS act_code
       FROM   request_reserve rr
       LEFT   JOIN act_code   a ON r.id_act_code = a.id
       GROUP  BY rr.id_request, rr.id_warehouse
       )  rr ON rr.id_request = r.id
    WHERE  r.id_request_type = "any_type";  -- use single quotes for values!
    

    For big queries it is essential that you have a format the human eye can easily parse. Therefore I reformatted before I improved the query. I use table aliases to avoid unwieldy identifiers as much as possible.

    Minor improvement to create_user: no trailing space. If either part of the name can be NULL, I suggest this to avoid a dangling space:

    COALESCE(u.name || ' ' || u.surname, u.name, u.surname)
    

    In PostgreSQL 9.1 or later you could use concat_ws().