Search code examples
sqlpostgresqlindexingddl

Postgres array_position(array, element) sometimes 0-indexed?


Postgres method array_position(array, element), like other things in SQL, is 1-based. For example:

SELECT array_position(array[4,5,6], 5)    -- returns 2

But, I'm doing the following query to retrieve non-unique indexes from pg_catalog, along with their columns:

SELECT non_unique_indexes.indname          AS index_name,
       non_unique_indexes.relname          AS table_name,
       columns.column_name                 AS column_name,
       array_position(non_unique_indexes.indkey, columns.ordinal_position::smallint) AS column_position,
       CASE non_unique_indexes.indoption[array_position(non_unique_indexes.indkey, columns.ordinal_position::smallint)]
           WHEN 1 THEN 'DESC'
           WHEN 3 THEN 'DESC'
           ELSE 'ASC'
       END AS direction
FROM (
    SELECT pg_namespace.nspname,
           pg_class_tables.relname,
           pg_class_indexes.relname AS indname,
           pg_index.indkey,
           pg_index.indoption,
           pg_tablespace.spcname
    FROM pg_catalog.pg_index
    INNER JOIN pg_catalog.pg_class pg_class_tables  ON pg_class_tables.oid  = pg_index.indrelid
    INNER JOIN pg_catalog.pg_class pg_class_indexes ON pg_class_indexes.oid = pg_index.indexrelid
    INNER JOIN pg_catalog.pg_tablespace ON pg_tablespace.oid = pg_class_indexes.reltablespace
    INNER JOIN pg_catalog.pg_namespace ON pg_namespace.oid = pg_class_indexes.relnamespace
    WHERE pg_index.indisunique = false
      AND pg_namespace.nspname = 'my_schema'
) non_unique_indexes
INNER JOIN information_schema.columns ON columns.table_schema     = non_unique_indexes.nspname
                                     AND columns.table_name       = non_unique_indexes.relname
                                     AND columns.ordinal_position = ANY(non_unique_indexes.indkey)
ORDER BY non_unique_indexes.relname,
         non_unique_indexes.indname,
         array_position(non_unique_indexes.indkey, columns.ordinal_position::smallint)

The 4th select searches the indkey array for the column's ordinal_position, so for each column it gets its position in the index.

Funny thing is that first column has position 0, so I had to add + 1 to have it 1-based.

The subsequent CASE expression, where the very same value is used as index for retrieving n-th element of indoption, curiously works fine even though [] operator is 1-based as well:

SELECT (array[4,5,6])[2]    -- returns 5

How is this?

I'm currently on PG 9.6.


Solution

  • Array subscripts

    You stated:

    Postgres method array_position(array, element), like other things in SQL, is 1-based.

    But that's subtly incorrect. Postgres arrays are 1-based by default. But Postgres allows any range of integers as index. And the function array_position() isn't anything-based. It just returns the index as found.

    SELECT array_position('[7:9]={4,5,6}'::int[], 5);  -- returns 8!
    

    See:

    And pg_index.indkey is not an array to begin with. It's type int2vector, which is an internal type, not available for general use, and 0-based! It allows subscripts (similar to an array). A cast to int2[] preserves 0-based array subscripts (indices).

    Proper query

    Either way, your query doesn't seem right.

    The INNER JOIN on pg_tablespace eliminates indexes stored in the default tablespace. The manual on pg_class.reltablespace:

    If zero, the database's default tablespace is implied.

    But there is no entry in pg_tablespace with oid = 0, so make that a LEFT JOIN.

    There are many more caveats if you try to extract parts of the index definition by hand. What you have for ASC / DESC doesn't quite cut it. See:

    And you didn't even consider NULLS FIRST | LAST. Or a possible WHERE condition for partial indices, ...

    I strongly suggest this simple, fast and reliable alternative using the built-in System Catalog Information Function pg_get_indexdef():

    SELECT ci.relname AS index_name
         , ix.indrelid::regclass::text AS table_name
         , pg_get_indexdef (ix.indexrelid) AS idx_def
    FROM   pg_catalog.pg_index     ix
    JOIN   pg_catalog.pg_class     ci ON ci.oid = ix.indexrelid
    JOIN   pg_catalog.pg_namespace ns ON ns.oid = ci.relnamespace
    WHERE  ix.indisunique = false
    AND    ns.nspname = 'my_schema'
    ORDER  BY 2, 1;
    

    The manual:

    Reconstructs the creating command for an index. (This is a decompiled reconstruction, not the original text of the command.)

    This gets all aspects right and keeps working across Postgres versions.

    Your query

    If you insist on decomposing the index definition, this query should basically work (as of Postgres 14):

    SELECT ci.relname AS index_name
         , ct.relname AS table_name
         , pg_get_indexdef (ix.indexrelid, pos::int, false) AS idx_expression
         , CASE WHEN ia.indopt & 1 = 1 THEN 'DESC' ELSE 'ASC' END AS direction
         , CASE WHEN ia.indopt & 2 = 2 THEN 'NULLS FIRST' ELSE 'NULLS LAST' END AS direction_nulls
         , pg_get_expr(ix.indpred, ix.indrelid) AS where_clause
         , ia.pos AS column_position
         , ix.indkey
         , ix.indoption
    FROM   pg_catalog.pg_index     ix
    JOIN   pg_catalog.pg_class     ct ON ct.oid = ix.indrelid
    JOIN   pg_catalog.pg_class     ci ON ci.oid = ix.indexrelid
    JOIN   pg_catalog.pg_namespace ns ON ns.oid = ci.relnamespace
    LEFT   JOIN pg_catalog.pg_tablespace ts ON ts.oid = ci.reltablespace
    CROSS  JOIN LATERAL unnest(ix.indkey, ix.indoption) WITH ORDINALITY AS ia(attnum, indopt, pos)
    WHERE  ix.indisunique = false
    AND    ns.nspname = 'my_schema'
    ORDER  BY ct.relname, ci.relname, ia.pos;
    

    But the "proper query" is far more stable and reliable.

    In particular I use unnest() with multiple arguments to unnest indkey and indoption in lockstep and with ordinal (1-based) position. See:

    About WITH ORDINALITY:

    I use pg_get_indexdef() to reconstruct each index field. This also covers expressions, not just plain table columns.

    I added direction_nulls indicating NULLS FIRST | LAST, see:

    And where_clause with a decompiled WHERE clause for partial indices (using pg_get_expr()).