Search code examples
postgresqltimestampplpgsqlepochpostgresql-9.5

Comparing timestamps to epoch seconds


In PostgreSQL 9.5 I have a table with 67000 records:

# \d words_nouns
           Table "public.words_nouns"
 Column  |           Type           | Modifiers 
---------+--------------------------+-----------
 word    | text                     | not null
 hashed  | text                     | not null
 added   | timestamp with time zone | 
 removed | timestamp with time zone | 
Indexes:
    "words_nouns_pkey" PRIMARY KEY, btree (word)
Check constraints:
    "words_nouns_word_check" CHECK (word ~ '^[A-Z]{2,}$'::text)

And a similar table words_verbs with 36000 records.

Is it a good idea to define the following custom function:

CREATE OR REPLACE FUNCTION words_get_added(
                in_visited integer,
                OUT out_json jsonb
        ) RETURNS jsonb AS
$func$
DECLARE
        _added text[];
BEGIN
        -- create array with words added to dictionary since in_visited timestamp
        IF in_visited > 0 THEN
                _added := (
                        SELECT ARRAY_AGG(hashed) 
                        FROM words_nouns 
                        WHERE EXTRACT(EPOCH FROM added) > in_visited
                        UNION
                        SELECT ARRAY_AGG(hashed) 
                        FROM words_verbs 
                        WHERE EXTRACT(EPOCH FROM added) > in_visited
                );

                IF  CARDINALITY(_added) > 0 THEN
                        out_json := jsonb_build_object('added', _added);
                END IF;
        END IF;
END
$func$ LANGUAGE plpgsql;

or should I better convert in_visited to a timestamp with timezone and compare to that:

CREATE OR REPLACE FUNCTION words_get_added(
                in_visited integer,
                OUT out_json jsonb
        ) RETURNS jsonb AS
$func$
DECLARE
        _added text[];
BEGIN
        -- create array with words added to dictionary since in_visited timestamp
        IF in_visited > 0 THEN
                _added := (
                        SELECT ARRAY_AGG(hashed) 
                        FROM words_nouns 
                        WHERE added > TO_TIMESTAMP(in_visited)
                        UNION
                        SELECT ARRAY_AGG(hashed) 
                        FROM words_verbs 
                        WHERE added > TO_TIMESTAMP(in_visited)
                );

                IF CARDINALITY(_added) > 0 THEN
                        out_json := jsonb_build_object('added', _added);
                END IF;
        END IF;
END
$func$ LANGUAGE plpgsql;

Below are 2 EXPLAIN outputs, but I am not sure how to interpret them:

# EXPLAIN SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE EXTRACT(EPOCH FROM added) > 0
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE EXTRACT(EPOCH FROM added) > 0;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Unique  (cost=2707.03..2707.04 rows=2 width=32)
   ->  Sort  (cost=2707.03..2707.03 rows=2 width=32)
         Sort Key: (array_agg(words_nouns.hashed))
         ->  Append  (cost=1740.53..2707.02 rows=2 width=32)
               ->  Aggregate  (cost=1740.53..1740.54 rows=1 width=32)
                     ->  Seq Scan on words_nouns  (cost=0.00..1684.66 rows=22348 width=32)
                           Filter: (date_part('epoch'::text, added) > '0'::double precision)
               ->  Aggregate  (cost=966.45..966.46 rows=1 width=32)
                     ->  Seq Scan on words_verbs  (cost=0.00..936.05 rows=12157 width=32)
                           Filter: (date_part('epoch'::text, added) > '0'::double precision)
(10 rows)

# EXPLAIN SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE added > to_timestamp(0)
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE added > to_timestamp(0);
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Unique  (cost=2361.99..2362.00 rows=2 width=32)
   ->  Sort  (cost=2361.99..2361.99 rows=2 width=32)
         Sort Key: (array_agg(words_nouns.hashed))
         ->  Append  (cost=1517.06..2361.98 rows=2 width=32)
               ->  Aggregate  (cost=1517.06..1517.07 rows=1 width=32)
                     ->  Seq Scan on words_nouns  (cost=0.00..1517.05 rows=1 width=32)
                           Filter: (added > '1970-01-01 01:00:00+01'::timestamp with time zone)
               ->  Aggregate  (cost=844.88..844.89 rows=1 width=32)
                     ->  Seq Scan on words_verbs  (cost=0.00..844.88 rows=1 width=32)
                           Filter: (added > '1970-01-01 01:00:00+01'::timestamp with time zone)
(10 rows)

The question is: which of the 2 stored functions has better performance or do they not differ?


Solution

  • The key factor for performance is an index that matches your query. Typically, you'd have an index on the plain column added, which should be matched with an input parameter of the same type, where the index is applicable.

    For the task at hand, combine a plain index on added with your second function (added > TO_TIMESTAMP(in_visited)) - or a variation on the theme. The functional transformation happens before the value is compared to the column added, thus the expression is "sargable".

    For top read performance you might have a multicolumn index on (added, hashed), and keep your table vacuumed to allow index-only scans ...

    Related: