Search code examples
performancepostgresqlindexingoperator-overloadingpostgresql-9.5

GIN index on smallint[] column not used or error "operator is not unique"


create table test(
    id serial primary key,
    tagged smallint[]
);

There is gin index on tagged column, with _int2_ops operator class:

CREATE INDEX ix ON test USING GIN(col _int2_ops);

When I run this query:

select * from test
where tagged @> ARRAY[11]
order by id limit 100;

EXPLAIN ANALYZE shows:

Limit  (cost=0.43..19524.39 rows=100 width=36) (actual time=25024.124..25027.263 rows=100 loops=1)
  ->  Index Scan using test_pkey on test  (cost=0.43..508404.37 rows=2604 width=36) (actual time=25024.121..25027.251 rows=100 loops=1)
        Filter: ((tagged)::integer[] @> '{11}'::integer[])
        Rows Removed by Filter: 2399999
Planning time: 6.912 ms
Execution time: 25027.307 ms

Bold emphasis mine. Why is the tagged column converted to integer[] type? I think this is the reason why GIN the index isn't used and the query runs slow.

I tried WHERE tagged @> ARRAY[11]::smallint[] but got this error:

operator is not unique: smallint[] @> smallint[]

If I do the same but use tagged int[] and create index as

CREATE INDEX ix ON test USING GIN(tagged gin__int_ops);

then the above query uses the GIN index:

"->  Bitmap Index Scan on ix  (cost=0.00..1575.53 rows=2604 width=0) (actual time=382.840..382.840 rows=2604480 loops=1)"
"   Index Cond: (tagged @> '{11}'::integer[])"

This is a bit faster than previous, but It takes on average 10 second - still too slow. I want to try smallint[] instead of int[], maybe that will be faster ...


Solution

  • Solution

    Most probably, the solution is to schema-qualify the operator:

    SELECT *
    FROM   test
    WHERE  tagged OPERATOR(pg_catalog.@>) '{11}'::int2[]
    ORDER  BY id
    LIMIT  100;

    Why?

    It's a problem of operator resolution (in combination with type resolution and cast context).

    In standard Postgres, there is only a single candidate operator anyarray @> anyarray, that's the one you want.

    Your setup would work just fine if you had not installed the additional module intarray (my assumption), which provides another operator for integer[] @> integer[].

    Hence, another solution would be to use integer[] instead and have a GIN index with the gin__int_ops operator class. Or try the (default for intarray) gist__int_ops index. Either might be faster, but both don't allow NULL values.
    Or you could rename the intarray operator @> to disambiguate. (I would not do that. Upgrade and portability issues ensue.)

    For expressions involving at least one operand of type integer[], Postgres knows which operator to pick: the intarray operator. But then the index is not applicable, because the intarray operator only operates on integer (int4) not int2. And indexes are strictly bound to operators:

    But for int2[] @> int2[], Postgres is unable to decide the best operator. Both seem equally applicable. Since the default operator is provided in the pg_catalog schema and the intarray operator is provided in the public schema (by default - or wherever you installed the extension), you can help solve the conundrum by schema-qualifying the operator with the OPERATOR() construct. Related:

    The error message you get is a bit misleading. But if you look closely, there is a HINT line added which hints (tada!) in the right direction:

    ERROR:  operator is not unique: smallint[] @> smallint[]
    LINE 1: SELECT NULL::int2[] @> NULL::int2[]
                                ^
    HINT:  Could not choose a best candidate operator. You might need to add explicit type casts.
    

    You can investigate existing operator candidates for @> with:

    SELECT o.oid, *, oprleft::regtype, oprright::regtype, n.nspname
    FROM   pg_operator o
    JOIN   pg_namespace n ON n.oid = o.oprnamespace
    WHERE  oprname = '@>';
    

    Another alternative solution would be to temporarily(!) set a different search_path, so only the desired operator is found. In the same transaction:

    SET LOCAL search_path = pg_catalog;
    SELECT ...
    

    But then you have to schema-qualify all tables in the query.

    About cast context:

    You could change the castcontext of int2 -> int4. But I strongly advise against it. Too many possible side effects: