Search code examples
postgresqlruby-on-rails-4rails-postgresql

Querying a PostgreSQL multi-dimensional array data type in Rails 4


I am using a PostgreSQL multi-dimensional array to mimic an array of hashes, and I am looking for a way to locate a record by a key-value pair in that array like e.g ["key1","value1"]. An example array is:

[ ["key1","value1"], ["key2","value2"] ]

The keys are in my case languages and a key may occur more than once.

Is there an efficient way to locate a record by a key-value pair using a PostgreSQL index with Rails 4?

Edit: fixed typo


Solution

  • I am using a PostgreSQL multi-dimensional array to mimic an array of hashes

    Those two things aren't really all that similar, and I wouldn't recommend attempting to use multidimensional arrays to model nested hashes.

    Pavel is quite right that hstore is probably a lot closer to what you want, and it's indexable too. However, the current version of hstore (in Pg 9.3 and older) supports only single-level keys; it's a dictionary/hash that can contain only scalar string values. A planned enhancement to hstore for PostgreSQL 9.4 will hopefully bring multi-level nesting and JSON syntax compatibility.

    Ordinary tables

    You can model arbitrary-depth key/value chains(and trees/graphs) using edgelists and recursive CTEs, but this probably rather more complexity than you really want.

    If you only need a fixed two-level key/value list, simply use a table that lists both key levels:

    CREATE TABLE twolevel(key1 text, key2 text, thevalue text not null, PRIMARY KEY(key1,key2));
    

    This lets you constrain against duplicate key pairs, which is nice.

    You can also use two tables with a foreign key relationship between them. This gives you cascade deletes if you want, so removing a top level key removes all sub-level keys and associated values. It's easy enough to do that with the single-table approach, though.

    Use one of these two approaches unless you have good reasons to do otherwise.

    Hstore as text

    Until the extended hstore is available, one option would be to store text representations of nested hstore fields. This is not pretty or efficient, but it's probably better than trying to search a multidimensional array.

    CREATE TABLE nested_hstore(id integer, blah hstore);
    
    insert into nested_hstore(id, blah) values 
    (1, hstore( ARRAY['key1','key2'], ARRAY['"key1.1"=>"value1.1", "key1.2"=>"value1.2"', '"key2.1"=>"value2.1", "key2.2"=>"value2.2"']::hstore[]::text[]));
    

    Test:

    regress=> select (blah->'key1')::hstore->'key1.1' from nested_hstore ;
     ?column? 
    ----------
     value1.1
    (1 row)
    

    Because the hstore must be parsed each time it's not going to be super-fast, and you won't get the usual indexing benefits on the second level. Still, it's an option if you really genuinely need two-level hashes in fields.

    Tables of hstore values

    You can combine these two quite reasonably.

    CREATE TABLE twolevel(key1 text, level2keyvalues hstore);
    

    It seems pretty ugly to me, though; I'd prefer to be consistent one way or the other.

    SQL/XML

    Another option is to use SQL/XML, which you can index along arbitrary XPATH expressions. Again, this seems a bit too complicated.