I have a table of users, with a column uuid and a column tags:
| uuid varchar PRIMARY KEY | tags ????? |
I am not sure what type the column tags
should be, hstore or varchar[]. I want it to contain a list of interests or categories, like 'burgers' or 'vegetables', such that I can query for all users who have any tags in a specified array (i.e. "Which users like any of 'burgers' 'vegetables' 'hotdogs'?") For this query to be fast, I imagine I should index on the individual categories however they are stored. I expect most users to have a small number of tags (0-5) but they could potentially have up to 100 or so. And there are many different options of tags (could be 1000+).
I believe I can index on keys in an hstore so that I know hstore type is an option. Is it possible to index on individual varchar elements of arrays? (I've seen posts about this but they were inconclusive.)
Postgres version 9.3.5
I would recommend separate tables for tags
You already have Table users with uuid, let's say:
CREATE TABLE users (
uuid serial primary key,
user_name text
);
Now the tags:
CREATE TABLE tags (
tag_id serial primary key,
tag_name text
);
CREATE TABLE users_tags (
uuid integer references users,
tag_id integer references tags,
primary key (uuid, tag_id)
);
Now you can easily query with for example:
SELECT * FROM users
JOIN users_tags USING (uuid)
JOIN tags USING (tag_id)
WHERE tag_name = 'Burgers';
Now you can easily add index on tag_name. You can also easily force uniqueness on tag name or create a unique index on lower(tag_name) that would eliminate problems with capital letters in tag names (Burgers vs. BurgerS).
A simpler solution would be to leave the tag table out and just create:
CREATE TABLE users_tags (
uuid integer references users,
tag_name text,
primary key (uuid, tag_name)
);
Whether you create a separate table for tags or just use users_tags -table mostly depends on how tags are used. Separate table is needed if you have a (mostly) defined set of tags and you maybe want to add info to a specific tag name later. The query "which users like 'hotdogs'" would suggest a separate tagle where tag 'hotdog' has a specific ID. If users can freely add all kinds of mumbojumbo tags and no info will be attached to them later then leave the separate table out.