Search code examples
postgresqlsubquerywhere-clausepostgresql-14

View (and subquery) returns rows without filtering


Postgresql 14.

Hi! I have multiple tables that are named like model_<uuid> and each table has columns that are named like element_<uuid>. All of these tables are in models schema. And I have query that transform information_schema.columns view in something like m2m table.

SELECT
  SPLIT_PART(table_name, '_', 2)::UUID AS model_id,
  SPLIT_PART(column_name, '_', 2)::UUID AS element_id
FROM information_schema.columns
WHERE
  table_schema = 'models' AND
  table_name LIKE 'model\_%' AND
  column_name LIKE 'element\_%'

It works properly. But when I try to wrap it in a view or use it as a subquery, I get the following error.

Subquery example:

SELECT model_id, ARRAY_AGG(element_id) AS element_ids
FROM (
  SELECT
    SPLIT_PART(table_name, '_', 2)::UUID AS model_id,
    SPLIT_PART(column_name, '_', 2)::UUID AS element_id
  FROM information_schema.columns
  WHERE
    table_schema = 'models' AND
    table_name LIKE 'model\_%' AND
    column_name LIKE 'element\_%'
) AS t
WHERE model_id = '2ba221e1-f2e3-4438-bafb-8e7bfefe161f'
GROUP BY model_id

Error:

ERROR: invalid input syntax for type uuid: "elements"

I have table elements but it is in public schema and I have table_schema = 'models' filter. I think that this filter does not work and subquery tries to cast it to 'uuid'. But when I run the subquery as a normal (root?) query, all rows are taken from the models schema.

And when I remove WHERE model_id = '2ba221e1-f2e3-4438-bafb-8e7bfefe161f' it's working properly again.

What is wrong? It literally blows my mind.

UPD:
Okay. I understand the reason. The reason for the error is that the query optimizer moves the query filter model_id = '...' inside the subquery filters. Error was raised on public.model_elements view.

I remove all UUIDs in table_name from query and run this query:

EXPLAIN SELECT model_id, ARRAY_AGG(element_id)
FROM (
    SELECT
        table_name AS model_id,
        SPLIT_PART(column_name, '_', 2)::UUID AS element_id
    FROM information_schema.columns
    WHERE
        table_schema = 'models'
        AND table_name LIKE 'model\_%'
        AND column_name LIKE 'element\_%'
) AS t
WHERE model_id = 'abc'
GROUP BY model_id;

Fragment from query plan:

->  Index Scan using pg_class_relname_nsp_index on pg_class c  (cost=0.28..4.31 rows=1 width=76)
    Index Cond: ((relname >= 'model_'::text) AND (relname < 'model`'::text) AND (relname = 'abc'::name))
    Filter: ((relname ~~ 'model\_%'::text) AND (relkind = ANY ('{r,v,f,p}'::""char""[])))"
->  Index Scan using pg_attribute_relid_attnam_index on pg_attribute a  (cost=0.29..4.32 rows=1 width=78)
    Index Cond: ((attrelid = c.oid) AND (attname >= 'element_'::text) AND (attname < 'element`'::text))
    Filter: ((NOT attisdropped) AND (attnum > 0) AND (attname ~~ 'element\_%'::text) AND (pg_has_role(c.relowner, 'USAGE'::text) OR has_column_privilege(c.oid, attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))"

As you can see, it has combined all filters (root filters and subquery filters) by table_name, including the SPLIT_PART(table_name, '_', 2)::UUID transformation (if we consider the original query) and uses them first. That is, first it tries to convert all table names into UUIDs, ignoring those filters that filter out tables for which it can't be done.

I found the problem but how to fix it?


Solution

  • PostgreSQL has to calculate SPLIT_PART(table_name, '_', 2)::UUID before it can compare it to '2ba221e1-f2e3-4438-bafb-8e7bfefe161f'? Simply avoid the cast to uuid and compare strings.

    Also, PostgreSQL is free to rearrange the query as long as it remains semantically equivalent. In the case at hand, it pulls up the subquery and evaluates the index conditions first. If you want to prevent PostgreSQL from pulling up the subquery, add OFFSET 0 at the end of the subquery. That makes no semantic difference, but is an optimizer barrier.