Search code examples
postgresqlnpgsql

Need help understanding NpgSQL connection opening process


I have been trying to optimize a web service that is using NpgSQL 3.2.7 to connect to a PostgreSQL 9.3 database. Today I installed pgBouncer and noticed when running "select * from pg_stat_activity;" that all of my NpgSQL connections had this query listed:

SELECT ns.nspname, a.typname, a.oid, a.typrelid, a.typbasetype,
CASE WHEN pg_proc.proname='array_recv' THEN 'a' ELSE a.typtype END AS type,
CASE
  WHEN pg_proc.proname='array_recv' THEN a.typelem
  WHEN a.typtype='r' THEN rngsubtype
  ELSE 0
END AS elemoid,
CASE
  WHEN pg_proc.proname IN ('array_recv','oidvectorrecv') THEN 3    /* Arrays last */
  WHEN a.typtype='r' THEN 2                                        /* Ranges before */
  WHEN a.typtype='d' THEN 1                                        /* Domains before */
  ELSE 0                                                           /* Base types first */
END AS ord
FROM pg_type AS a
JOIN pg_namespace AS ns ON (ns.oid = a.typnamespace)
JOIN pg_proc ON pg_proc.oid = a.typreceive
LEFT OUTER JOIN pg_type AS b ON (b.oid = a.typelem)
LEFT OUTER JOIN pg_range ON (pg_range.rngtypid = a.oid) 
WHERE
  (
    a.typtype IN ('b', 'r', 'e', 'd') AND
    (b.typtype IS NULL OR b.typtype IN ('b', 'r', 'e', 'd'))  /* Either non-array or array of supported element type */
  ) 

When I run this query in pgAdmin it takes 3 to 5 seconds to complete the second time I run it when everything should be cached. When I have run my code interactively executing the first open command in a web service call has taken 3 to 5 seconds.

Does this run every time a connection is created? It looks to me like this is an expensive query to get some relatively static data. If this does have to run every time a connection is created, does anyone have any suggestions on how to architect around this in a web service? 3 to 5 seconds is just too much overhead for every call to a web service. Does using pooling have any affect on whether or not this query is run?

ADDED: 03/14/2018 These are log entries I am seeing after creating a table to hold the results of the types query. It runs it successfully and then later cannot find the table for some reason.

2018-03-14 15:35:42 EDT LOG: duration: 0.715 ms parse : select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"

2018-03-14 15:35:42 EDT LOG: duration: 0.289 ms bind : select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"

2018-03-14 15:35:42 EDT LOG: execute : select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"

2018-03-14 15:35:42 EDT LOG: duration: 0.391 ms

2018-03-14 15:35:44 EDT ERROR: relation "public.npgsqltypes" does not exist at character 71

2018-03-14 15:35:44 EDT STATEMENT: select nspname,typname,oid,typrelid,typbasetype,type,elemoid,ord from "public"."npgsqltypes"

2018-03-14 15:35:44 EDT LOG: statement: DISCARD ALL

2018-03-14 15:35:44 EDT LOG: duration: 0.073 ms

ADDED: 03/15/2018

Explain output of types query:

Sort  (cost=3015139.78..3018795.67 rows=1462356 width=213)
  Sort Key: (CASE WHEN (pg_proc.proname = ANY ('{array_recv,oidvectorrecv}'::name[])) THEN 3 WHEN (a.typtype = 'r'::"char") THEN 2 WHEN (a.typtype = 'd'::"char") THEN 1 ELSE 0 END)
  ->  Hash Left Join  (cost=920418.37..2779709.53 rows=1462356 width=213)
        Hash Cond: (a.oid = pg_range.rngtypid)
        ->  Hash Join  (cost=920417.24..2752289.21 rows=1462356 width=209)
              Hash Cond: ((a.typreceive)::oid = pg_proc.oid)
              ->  Hash Join  (cost=919817.78..2724270.58 rows=1462356 width=149)
                    Hash Cond: (a.typnamespace = ns.oid)
                    ->  Hash Left Join  (cost=919305.50..2687199.40 rows=1462356 width=89)
                          Hash Cond: (a.typelem = b.oid)
                          Filter: (((a.typtype = ANY ('{b,r,e,d}'::"char"[])) AND ((b.typtype IS NULL) OR (b.typtype = ANY ('{b,r,e,d}'::"char"[])))) OR ((a.typname = ANY ('{record,void}'::name[])) AND (a.typtype = 'p'::"char")))
                          ->  Seq Scan on pg_type a  (cost=0.00..694015.89 rows=13731889 width=89)
                          ->  Hash  (cost=694015.89..694015.89 rows=13731889 width=5)
                                ->  Seq Scan on pg_type b  (cost=0.00..694015.89 rows=13731889 width=5)
                    ->  Hash  (cost=388.79..388.79 rows=9879 width=68)
                          ->  Seq Scan on pg_namespace ns  (cost=0.00..388.79 rows=9879 width=68)
              ->  Hash  (cost=465.87..465.87 rows=10687 width=68)
                    ->  Seq Scan on pg_proc  (cost=0.00..465.87 rows=10687 width=68)
        ->  Hash  (cost=1.06..1.06 rows=6 width=8)
              ->  Seq Scan on pg_range  (cost=0.00..1.06 rows=6 width=8)

Solution

  • You're right, this query is issued by Npgsql to load all the types from a PostgreSQL backend - different database can have different data types (due to extensions, user-defined types, etc.).

    However, this query is sent only on the first physical connection to a specific database, as identified by its connection string. In other words, if you connect to the same database X times - to the same connection string - you should only see this query being sent once. Npgsql caches this information internally. I just verified that this is the behavior in 3.2.7, are you seeing something else?