I have been trying to optimize a web service that is using NpgSQL 3.2.7 to connect to a PostgreSQL 9.3 database. Today I installed pgBouncer and noticed when running "select * from pg_stat_activity;" that all of my NpgSQL connections had this query listed:
SELECT ns.nspname, a.typname, a.oid, a.typrelid, a.typbasetype,
CASE WHEN pg_proc.proname='array_recv' THEN 'a' ELSE a.typtype END AS type,
CASE
WHEN pg_proc.proname='array_recv' THEN a.typelem
WHEN a.typtype='r' THEN rngsubtype
ELSE 0
END AS elemoid,
CASE
WHEN pg_proc.proname IN ('array_recv','oidvectorrecv') THEN 3 /* Arrays last */
WHEN a.typtype='r' THEN 2 /* Ranges before */
WHEN a.typtype='d' THEN 1 /* Domains before */
ELSE 0 /* Base types first */
END AS ord
FROM pg_type AS a
JOIN pg_namespace AS ns ON (ns.oid = a.typnamespace)
JOIN pg_proc ON pg_proc.oid = a.typreceive
LEFT OUTER JOIN pg_type AS b ON (b.oid = a.typelem)
LEFT OUTER JOIN pg_range ON (pg_range.rngtypid = a.oid)
WHERE
(
a.typtype IN ('b', 'r', 'e', 'd') AND
(b.typtype IS NULL OR b.typtype IN ('b', 'r', 'e', 'd')) /* Either non-array or array of supported element type */
)
When I run this query in pgAdmin it takes 3 to 5 seconds to complete the second time I run it when everything should be cached. When I have run my code interactively executing the first open command in a web service call has taken 3 to 5 seconds.
Does this run every time a connection is created? It looks to me like this is an expensive query to get some relatively static data. If this does have to run every time a connection is created, does anyone have any suggestions on how to architect around this in a web service? 3 to 5 seconds is just too much overhead for every call to a web service. Does using pooling have any affect on whether or not this query is run?
ADDED: 03/14/2018 These are log entries I am seeing after creating a table to hold the results of the types query. It runs it successfully and then later cannot find the table for some reason.
2018-03-14 15:35:42 EDT LOG: duration: 0.715 ms parse : select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"
2018-03-14 15:35:42 EDT LOG: duration: 0.289 ms bind : select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"
2018-03-14 15:35:42 EDT LOG: execute : select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"
2018-03-14 15:35:42 EDT LOG: duration: 0.391 ms
2018-03-14 15:35:44 EDT ERROR: relation "public.npgsqltypes" does not exist at character 71
2018-03-14 15:35:44 EDT STATEMENT: select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"
2018-03-14 15:35:44 EDT LOG: statement: DISCARD ALL
2018-03-14 15:35:44 EDT LOG: duration: 0.073 ms
ADDED: 03/15/2018
Explain output of types query:
Sort (cost=3015139.78..3018795.67 rows=1462356 width=213)
Sort Key: (CASE WHEN (pg_proc.proname = ANY ('{array_recv,oidvectorrecv}'::name[])) THEN 3 WHEN (a.typtype = 'r'::"char") THEN 2 WHEN (a.typtype = 'd'::"char") THEN 1 ELSE 0 END)
-> Hash Left Join (cost=920418.37..2779709.53 rows=1462356 width=213)
Hash Cond: (a.oid = pg_range.rngtypid)
-> Hash Join (cost=920417.24..2752289.21 rows=1462356 width=209)
Hash Cond: ((a.typreceive)::oid = pg_proc.oid)
-> Hash Join (cost=919817.78..2724270.58 rows=1462356 width=149)
Hash Cond: (a.typnamespace = ns.oid)
-> Hash Left Join (cost=919305.50..2687199.40 rows=1462356 width=89)
Hash Cond: (a.typelem = b.oid)
Filter: (((a.typtype = ANY ('{b,r,e,d}'::"char"[])) AND ((b.typtype IS NULL) OR (b.typtype = ANY ('{b,r,e,d}'::"char"[])))) OR ((a.typname = ANY ('{record,void}'::name[])) AND (a.typtype = 'p'::"char")))
-> Seq Scan on pg_type a (cost=0.00..694015.89 rows=13731889 width=89)
-> Hash (cost=694015.89..694015.89 rows=13731889 width=5)
-> Seq Scan on pg_type b (cost=0.00..694015.89 rows=13731889 width=5)
-> Hash (cost=388.79..388.79 rows=9879 width=68)
-> Seq Scan on pg_namespace ns (cost=0.00..388.79 rows=9879 width=68)
-> Hash (cost=465.87..465.87 rows=10687 width=68)
-> Seq Scan on pg_proc (cost=0.00..465.87 rows=10687 width=68)
-> Hash (cost=1.06..1.06 rows=6 width=8)
-> Seq Scan on pg_range (cost=0.00..1.06 rows=6 width=8)
You're right, this query is issued by Npgsql to load all the types from a PostgreSQL backend - different database can have different data types (due to extensions, user-defined types, etc.).
However, this query is sent only on the first physical connection to a specific database, as identified by its connection string. In other words, if you connect to the same database X times - to the same connection string - you should only see this query being sent once. Npgsql caches this information internally. I just verified that this is the behavior in 3.2.7, are you seeing something else?