Search code examples
postgresqlupgradepostgresql-9.6postgresql-12

Postgres pg_upgrade corrupt indexes (UK) missing values though missing records by index based selects


After 9.6->12.3 pg_upgrade we marked some serious select give missing results! REINDEX or drop / create healthed the problem.

Upgrade bulletpoints

  1. Stop 9.6
  2. rsync 9.6 data and bin files from Centos7 to Centos8 (pre installed 12)
  3. pg_upgrade
  4. ./analyze_new_cluster.sh
  5. ./delete_old_cluster.sh

Per database we found 1-3 UNIQUE corrupt indexes. Missed circa 20 values per index.

We found a very useful tool amcheck! https://www.postgresql.org/docs/10/amcheck.html

SELECT bt_index_check(c.oid), c.relname, c.relpages
FROM pg_index i
JOIN pg_opclass op ON i.indclass[0] = op.oid
JOIN pg_am am ON op.opcmethod = am.oid
JOIN pg_class c ON i.indexrelid = c.oid
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE am.amname = 'btree' AND n.nspname = 'pg_catalog'
-- Don't check temp tables, which may be from another session:
AND c.relpersistence != 't'
-- Function may throw an error when this is omitted:
AND i.indisready AND i.indisvalid
ORDER BY c.relpages DESC LIMIT 10;

VERY IMPORTANT: Comment out (AND n.nspname = 'pg_catalog' + LIMIT 10)restictions by validation, to run bt_index_check function to your index too!

And yes the function throw an exception if find a corrupt index.

Why does index go wrong? How can we be sure our new db is consistent and upgrade is succesful?


Solution

  • Why does index go wrong?

    The most likely explanation is a difference in the glibc versions between CentOS 7 and CentOS 8.

    This blog post gives some more insights about this.

    In general when changing glibc versions (e.g. because of system patches or OS upgrade), you should re-index all indexes that include text, varchar or char columns.

    This is not something Postgres can directly influence, though the ability to use ICU collations is a partial answer to that problem. But if the ICU version of the operating system is updated (e.g. again implicitly because of system patches) you would have the same problem there (but it seems the ICU libraries are less often updated than the glibc libraries)

    I think there is some work in progress to at least warn the user. But to my knowledge nothing as been committed in the current version 12 or the upcoming version 13.