Search code examples
postgresqlquery-optimizationrails-postgresqlsql-optimization

Postgres- pgsql taking more time to retrieve data from table with more than 1.5 billion rows


How can I optimize the table or the query the following pgsql query(takes 34 min. to get 770 records)? Already indexes are added to the table for few columns. Not sure what else can make this query

Query:

SELECT 
    min(p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles') as Date, 
    'America/Los_Angeles' AS Timezone, 
    sum(GREATEST(0, p.value)) as Value, 
    p.uom as UnitOfMeasurement
FROM
    pv.bsa_vessel_vs p                                 
WHERE
        p.start_timestamp AT TIME ZONE p.timezone >= '2017-01-01'
    and p.start_timestamp AT TIME ZONE p.timezone <  '2017-02-01'
    and p.vessel_serial_number ='U57625059'
GROUP BY
    date_trunc('hour', p.start_timestamp AT TIME ZONE p.timezone AT TIME ZONE 'America/Los_Angeles'), p.uom   
ORDER BY
    Date ;

Table:

CREATE TABLE pv.bsa_vessel_vs
(
  bsa_vessel_vs_id bigserial NOT NULL,
  data_source_id bigint NOT NULL,
  start_timestamp timestamp without time zone NOT NULL,
  end_timestamp timestamp without time zone NOT NULL,
  value numeric(12,4) NOT NULL,
  uom text NOT NULL,
  timezone text NOT NULL,
  created_timestamp timestamp without time zone DEFAULT now(),
  updated_timestamp timestamp without time zone DEFAULT now(),
  vessel_serial_number text NOT NULL,
  CONSTRAINT bsa_vessel_vs_pkey PRIMARY KEY (bsa_vessel_vs_id),
  CONSTRAINT bsa_vessel_vs_data_source_id_fkey FOREIGN KEY (data_source_id)
      REFERENCES pv.data_source (data_source_id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE RESTRICT
)
WITH (
  OIDS=FALSE
);

CREATE INDEX pm_start_timestamp_ndex
  ON pv.bsa_vessel_vs
  USING btree
  (start_timestamp DESC NULLS LAST);

CREATE INDEX bsa_vessel_vs_meter_ts_idx
  ON pv.bsa_vessel_vs
  USING btree
  (vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp);


CREATE UNIQUE INDEX bsa_vessel_vs_u_idx
  ON pv.bsa_vessel_vs
  USING btree
  (data_source_id, vessel_serial_number COLLATE pg_catalog."default", start_timestamp, end_timestamp DESC);

Thanks Karthey


Solution

  • Change your index so that it includes the same expressions you use in your WHERE clause, that is:

    CREATE INDEX bsa_vessel_vs_meter_ts_2_idx
      ON bsa_vessel_vs
      USING btree
      ( vessel_serial_number COLLATE pg_catalog."default", 
        (start_timestamp AT TIME ZONE timezone), 
        (start_timestamp AT TIME ZONE timezone)
      );
    

    When you define that index, you'll get an execution plan that uses it:

    | QUERY PLAN                                                                                                                                                                                                                                                            |
    | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | Sort  (cost=69.60..69.70 rows=39 width=83)                                                                                                                                                                                                                            |
    |   Sort Key: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp))))                                                                                                                                                                         |
    |   ->  HashAggregate  (cost=67.79..68.57 rows=39 width=83)                                                                                                                                                                                                             |
    |         Group Key: date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp))), uom                                                                                                                                          |
    |         ->  Index Scan using bsa_vessel_vs_meter_ts_2_idx on bsa_vessel_vs p  (cost=0.28..67.20 rows=39 width=44)                                                                                                                                                     |
    |               Index Cond: ((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone, start_timestamp) < '2017-02-01 00:00:00+00'::timestamp with time zone)) |
    

    Whereas, if the index is not there, PostgreSQL resorts to a full table scan:

    | QUERY PLAN                                                                                                                                                                                                                                                              |
    | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
    | Sort  (cost=298.84..298.94 rows=39 width=83)                                                                                                                                                                                                                            |
    |   Sort Key: (min(timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp))))                                                                                                                                                                           |
    |   ->  GroupAggregate  (cost=296.35..297.81 rows=39 width=83)                                                                                                                                                                                                            |
    |         Group Key: (date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom                                                                                                                                          |
    |         ->  Sort  (cost=296.35..296.45 rows=39 width=44)                                                                                                                                                                                                                |
    |               Sort Key: (date_trunc('hour'::text, timezone('America/Los_Angeles'::text, timezone(timezone, start_timestamp)))), uom                                                                                                                                     |
    |               ->  Seq Scan on bsa_vessel_vs p  (cost=0.00..295.32 rows=39 width=44)                                                                                                                                                                                     |
    |                     Filter: ((vessel_serial_number = 'U57625059'::text) AND (timezone(timezone, start_timestamp) >= '2017-01-01 00:00:00+00'::timestamp with time zone) AND (timezone(timezone, start_timestamp) < '2017-02-01 00:00:00+00'::timestamp with time zone)) |
    

    You can check all the setup at dbfiddle here