Search code examples
sqlpostgresqlindexingcounttimezone

Add an index to a timestamp with time zone


I want to improve this slow query, I thinking to add an index, but I don't know what index type is better to my case.

SELECT COUNT(*) ct FROM events
WHERE dtt AT TIME ZONE 'America/Santiago'
   >= date(now() AT TIME ZONE 'America/Santiago') + interval '1s'  

Query Plan:

"Aggregate  (cost=128032.03..128032.04 rows=1 width=0) (actual time=3929.083..3929.083 rows=1 loops=1)"
"  ->  Seq Scan on events  (cost=0.00..125937.68 rows=837742 width=0) (actual time=113.080..3926.972 rows=25849 loops=1)"
"        Filter: (timezone('America/Santiago'::text, dtt) >= (date(timezone('America/Santiago'::text, now())) + '00:00:01'::interval))"
"        Rows Removed by Filter: 2487386"
"Planning time: 0.179 ms"
"Execution time: 3929.136 ms"
  • The query gets the count of events of the day.
  • dtt is a timestamp with time zone column.
  • I'm using Postgresql 9.4.

Note: With the Erwin advices the query run a little faster but still I think isn't fast enough.

"Aggregate  (cost=119667.76..119667.77 rows=1 width=0) (actual time=3687.151..3687.152 rows=1 loops=1)"
"  ->  Seq Scan on vehicle_events  (cost=0.00..119667.14 rows=250 width=0) (actual time=104.635..3687.068 rows=469 loops=1)"
"        Filter: (dtt >= timezone('America/Santiago'::text, date_trunc('day'::text, timezone('America/Santiago'::text, now()))))"
"        Rows Removed by Filter: 2513337"
"Planning time: 0.164 ms"
"Execution time: 3687.204 ms"

Solution

  • First, fix your query to make the predicate "sargable":

    SELECT count(*) AS ct
    FROM   events
    WHERE  dtt >= date_trunc('day', now() AT TIME ZONE 'America/Santiago')
                                          AT TIME ZONE 'America/Santiago';
    

    Use the column value as is and move all calculations to the parameter.

    That's right, after truncating to the local start of the day, apply AT TIME ZONE a second time to convert the timestamp value back to timestamptz again. See:

    Explanation step-by-step

    1. now()
      .. is the Postgres implementation for the SQL standard CURRENT_TIMESTAMP. Both are 100 % equivalent, you can use either. It returns the current point in time as timestamptz - the display of the value takes the time zone of the current session into consideration, but that's irrelevant for the value.

    2. now() AT TIME ZONE 'America/Santiago'
      .. computes the local time for the given time zone. The resulting data type is timestamp. We do this to allow for:

    3. date_trunc( now() AT TIME ZONE 'America/Santiago' )
      .. truncates the time component to get the local start of the day in 'America/Santiago', independent of the current time zone setting.

    4. date_trunc('day', now() AT TIME ZONE 'America/Santiago') AT TIME ZONE 'America/Santiago'
      .. feeding the timestamp to the AT TIME ZONE construct we get the corresponding timestamptz value (UTC internally) to compare the timestamptz value dtt to.

    I removed the + interval '1s', suspecting you have just been abusing that to convert the date to timestamp. Use date_trunc() instead to produce a timestamp value.

    Now, a plain (default) btree index on dtt will do. Of course, the index will only be used, if the predicate is selective enough.

    CREATE INDEX events_dtt_idx ON events (dtt);
    

    If your important queries only consider recent rows, a partial index might help some more. Details: