In PostgreSQL 9.5 I have a table with 67000 records:
# \d words_nouns
Table "public.words_nouns"
Column | Type | Modifiers
---------+--------------------------+-----------
word | text | not null
hashed | text | not null
added | timestamp with time zone |
removed | timestamp with time zone |
Indexes:
"words_nouns_pkey" PRIMARY KEY, btree (word)
Check constraints:
"words_nouns_word_check" CHECK (word ~ '^[A-Z]{2,}$'::text)
And a similar table words_verbs
with 36000 records.
Is it a good idea to define the following custom function:
CREATE OR REPLACE FUNCTION words_get_added(
in_visited integer,
OUT out_json jsonb
) RETURNS jsonb AS
$func$
DECLARE
_added text[];
BEGIN
-- create array with words added to dictionary since in_visited timestamp
IF in_visited > 0 THEN
_added := (
SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE EXTRACT(EPOCH FROM added) > in_visited
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE EXTRACT(EPOCH FROM added) > in_visited
);
IF CARDINALITY(_added) > 0 THEN
out_json := jsonb_build_object('added', _added);
END IF;
END IF;
END
$func$ LANGUAGE plpgsql;
or should I better convert in_visited
to a timestamp with timezone and compare to that:
CREATE OR REPLACE FUNCTION words_get_added(
in_visited integer,
OUT out_json jsonb
) RETURNS jsonb AS
$func$
DECLARE
_added text[];
BEGIN
-- create array with words added to dictionary since in_visited timestamp
IF in_visited > 0 THEN
_added := (
SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE added > TO_TIMESTAMP(in_visited)
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE added > TO_TIMESTAMP(in_visited)
);
IF CARDINALITY(_added) > 0 THEN
out_json := jsonb_build_object('added', _added);
END IF;
END IF;
END
$func$ LANGUAGE plpgsql;
Below are 2 EXPLAIN
outputs, but I am not sure how to interpret them:
# EXPLAIN SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE EXTRACT(EPOCH FROM added) > 0
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE EXTRACT(EPOCH FROM added) > 0;
QUERY PLAN
---------------------------------------------------------------------------------------------
Unique (cost=2707.03..2707.04 rows=2 width=32)
-> Sort (cost=2707.03..2707.03 rows=2 width=32)
Sort Key: (array_agg(words_nouns.hashed))
-> Append (cost=1740.53..2707.02 rows=2 width=32)
-> Aggregate (cost=1740.53..1740.54 rows=1 width=32)
-> Seq Scan on words_nouns (cost=0.00..1684.66 rows=22348 width=32)
Filter: (date_part('epoch'::text, added) > '0'::double precision)
-> Aggregate (cost=966.45..966.46 rows=1 width=32)
-> Seq Scan on words_verbs (cost=0.00..936.05 rows=12157 width=32)
Filter: (date_part('epoch'::text, added) > '0'::double precision)
(10 rows)
# EXPLAIN SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE added > to_timestamp(0)
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE added > to_timestamp(0);
QUERY PLAN
------------------------------------------------------------------------------------------------
Unique (cost=2361.99..2362.00 rows=2 width=32)
-> Sort (cost=2361.99..2361.99 rows=2 width=32)
Sort Key: (array_agg(words_nouns.hashed))
-> Append (cost=1517.06..2361.98 rows=2 width=32)
-> Aggregate (cost=1517.06..1517.07 rows=1 width=32)
-> Seq Scan on words_nouns (cost=0.00..1517.05 rows=1 width=32)
Filter: (added > '1970-01-01 01:00:00+01'::timestamp with time zone)
-> Aggregate (cost=844.88..844.89 rows=1 width=32)
-> Seq Scan on words_verbs (cost=0.00..844.88 rows=1 width=32)
Filter: (added > '1970-01-01 01:00:00+01'::timestamp with time zone)
(10 rows)
The question is: which of the 2 stored functions has better performance or do they not differ?
The key factor for performance is an index that matches your query. Typically, you'd have an index on the plain column added
, which should be matched with an input parameter of the same type, where the index is applicable.
For the task at hand, combine a plain index on added
with your second function (added > TO_TIMESTAMP(in_visited)
) - or a variation on the theme. The functional transformation happens before the value is compared to the column added
, thus the expression is "sargable".
For top read performance you might have a multicolumn index on (added, hashed)
, and keep your table vacuumed to allow index-only scans ...
Related: