Search code examples
ruby-on-railspostgresqlruby-on-rails-4rails-activerecordarray-column

Query against a Postgres array column type


TL;DR I'm wondering what the pros and cons are (or if they are even equivalent) between @> {as_champion, whatever} and using IN ('as_champion', 'whatever') is. Details below:

I'm working with Rails and using Postgres' array column type, but having to use raw sql for my query as the Rails finder methods don't play nicely with it. I found a way that works, but wondering what the preferred method is:

The roles column on the Memberships table is my array column. It was added via rails as so:

add_column :memberships, :roles, :text, array: true

When I examine the table, it shows the type as: text[] (not sure if that is truly how Postgres represents an array column or if that is Rails shenanigans.

To query against it I do something like:

Membership.where("roles @> ?", '{as_champion, whatever}')

Solution

  • From the fine Array Operators manual:

    Operator: @>
    Description: contains
    Example: ARRAY[1,4,3] @> ARRAY[3,1]
    Result: t (AKA true)

    So @> treats its operand arrays as sets and checks if the right side is a subset of the left side.

    IN is a little different and is used with subqueries:

    9.22.2. IN

    expression IN (subquery)
    

    The right-hand side is a parenthesized subquery, which must return exactly one column. The left-hand expression is evaluated and compared to each row of the subquery result. The result of IN is "true" if any equal subquery row is found. The result is "false" if no equal row is found (including the case where the subquery returns no rows).

    or with literal lists:

    9.23.1. IN

    expression IN (value [, ...])
    

    The right-hand side is a parenthesized list of scalar expressions. The result is "true" if the left-hand expression's result is equal to any of the right-hand expressions. This is a shorthand notation for

    expression = value1
    OR
    expression = value2
    OR
    ...
    

    So a IN b more or less means:

    Is the value a equal to any of the values in the list b (which can be a query producing single element rows or a literal list).

    Of course, you can say things like:

    array[1] in (select some_array from ...)
    array[1] in (array[1], array[2,3])
    

    but the arrays in those cases are still treated like single values (that just happen to have some internal structure).


    If you want to check if an array contains any of a list of values then @> isn't what you want. Consider this:

    array[1,2] @> array[2,4]
    

    4 isn't in array[1,2] so array[2,4] is not a subset of array[1,2].

    If you want to check if someone has both roles then:

    roles @> array['as_champion', 'whatever']
    

    is the right expression but if you want to check if roles is any of those values then you want the overlaps operator (&&):

    roles && array['as_champion', 'whatever']
    

    Note that I'm using the "array constructor" syntax for the arrays everywhere, that's because it is much more convenient for working with a tool (such as ActiveRecord) that knows to expand an array into a comma delimited list when replacing a placeholder but doesn't fully understand SQL arrays.

    Given all that, we can say things like:

    Membership.where('roles @> array[?]', %w[as_champion whatever])
    Membership.where('roles @> array[:roles]', :roles => some_ruby_array_of_strings)
    

    and everything will work as expected. You're still working with little SQL snippets (as ActiveRecord doesn't have a full understanding of SQL arrays or any way of representing the @> operator) but at least you won't have to worry about quoting problems. You could probably go through AREL to manually add @> support but I find that AREL quickly devolves into an incomprehensible and unreadable mess for all but the most trivial uses.