Search code examples
arrayspostgresqlindexinghstore

Postgresql - index on individual array elements or index on keys in an hstore


I have a table of users, with a column uuid and a column tags:

| uuid varchar PRIMARY KEY | tags ????? |

I am not sure what type the column tags should be, hstore or varchar[]. I want it to contain a list of interests or categories, like 'burgers' or 'vegetables', such that I can query for all users who have any tags in a specified array (i.e. "Which users like any of 'burgers' 'vegetables' 'hotdogs'?") For this query to be fast, I imagine I should index on the individual categories however they are stored. I expect most users to have a small number of tags (0-5) but they could potentially have up to 100 or so. And there are many different options of tags (could be 1000+).

I believe I can index on keys in an hstore so that I know hstore type is an option. Is it possible to index on individual varchar elements of arrays? (I've seen posts about this but they were inconclusive.)


Postgres version 9.3.5


Solution

  • I would recommend separate tables for tags

    You already have Table users with uuid, let's say:

    CREATE TABLE users (
        uuid serial primary key,
        user_name text
    );
    

    Now the tags:

    CREATE TABLE tags (
        tag_id serial primary key,
        tag_name text
    );
    
    CREATE TABLE users_tags (
        uuid integer references users,
        tag_id integer references tags,
        primary key (uuid, tag_id)
    );
    

    Now you can easily query with for example:

    SELECT * FROM users
        JOIN users_tags USING (uuid)
        JOIN tags USING (tag_id)
        WHERE tag_name = 'Burgers';
    

    Now you can easily add index on tag_name. You can also easily force uniqueness on tag name or create a unique index on lower(tag_name) that would eliminate problems with capital letters in tag names (Burgers vs. BurgerS).

    A simpler solution would be to leave the tag table out and just create:

    CREATE TABLE users_tags (
        uuid integer references users,
        tag_name text,
        primary key (uuid, tag_name)
    );
    

    Whether you create a separate table for tags or just use users_tags -table mostly depends on how tags are used. Separate table is needed if you have a (mostly) defined set of tags and you maybe want to add info to a specific tag name later. The query "which users like 'hotdogs'" would suggest a separate tagle where tag 'hotdog' has a specific ID. If users can freely add all kinds of mumbojumbo tags and no info will be attached to them later then leave the separate table out.