We recently upgraded the OS:
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.6 (Maipo)
After upgrading, we are facing lot of issues with GitLab (predominantly with Postgres)..
Our GitLab is dockerized i.e. GitLab (and all its internal services including PostgreSQL) is running inside a single container. The container does not have it's own glibc
, so it is using the one from the OS.
ERROR: canceling statement due to statement timeout
STATEMENT:
SELECT relnamespace::regnamespace as schemaname, relname as relname, pg_total_relation_size(oid) bytes FROM pg_class WHERE relkind = 'r';
The timeout messages appear continuously and this results in users facing 502 errors when accessing GitLab.
I checked the statement timeout set on the database.
gitlabhq_production=# show statement_timeout;
statement_timeout
-------------------
1min
(1 row)
I don't know what to make of this. This is probably the default setting. Is this an issue with postgres? What does this mean? Anything I can do to fix this?
EDIT:
Checked pg_stat_activity
and don't see any locks as the server was rebooted earlier. The same query is running fine now but we keep seeing this issue intermittently.
Ran \d pg_class
to check whether the table uses any indexes and also to check the string column.
gitlabhq_production=# \d pg_class
Table "pg_catalog.pg_class"
Column | Type | Modifiers
---------------------+-----------+-----------
relname | name | not null
relnamespace | oid | not null
reltype | oid | not null
reloftype | oid | not null
relowner | oid | not null
relam | oid | not null
relfilenode | oid | not null
reltablespace | oid | not null
relpages | integer | not null
reltuples | real | not null
relallvisible | integer | not null
reltoastrelid | oid | not null
relhasindex | boolean | not null
relisshared | boolean | not null
relpersistence | "char" | not null
relkind | "char" | not null
relnatts | smallint | not null
relchecks | smallint | not null
relhasoids | boolean | not null
relhaspkey | boolean | not null
relhasrules | boolean | not null
relhastriggers | boolean | not null
relhassubclass | boolean | not null
relrowsecurity | boolean | not null
relforcerowsecurity | boolean | not null
relispopulated | boolean | not null
relreplident | "char" | not null
relfrozenxid | xid | not null
relminmxid | xid | not null
relacl | aclitem[] |
reloptions | text[] |
Indexes:
"pg_class_oid_index" UNIQUE, btree (oid)
"pg_class_relname_nsp_index" UNIQUE, btree (relname, relnamespace)
"pg_class_tblspc_relfilenode_index" btree (reltablespace, relfilenode)
Would reindexing all tables and possibly alter
tables help?
You should check whether the query us running for a minute or whether it is blocked behind a database lock. This can be seen from the pg_stat_activity
row for the backend, which will show if the query is waiting for a lock or not (state=active
and wait_event_type
and wait_event
indicate a lock).
If it is a lock, get rid of the locking transaction. It may be a prepared transaction, so check for these too.
If there is no lock at fault, it could be that your indexes have become corrupted by the operating system upgrade:
Since PostgreSQL uses operating system collations, database indexes on strings are sorted in collation order and an operating system upgrade can (and often does) lead to changed collations due to bug fixes in the C library, you should rebuild all indexes on string columns after such an upgrade.
The statement that you are showing does not use an index scan, so it should not be affected, but other statements may be.
Also, if you are using Docker, it may be that your container uses its own glibc that was not upgraded, and then you are not affected.