sql postgresql set-returning-functions dataexplorer

Get each <tag> in String - stackexchange database

Mockup code for my problem:

SELECT Id FROM Tags WHERE TagName IN '<osx><keyboard><security><screen-lock>'

The problem in detail

I am trying to get tags used in 2011 from apple.stackexchange data. (this query)

As you can see, tags in tag changes are stored as plain text in the Text field.

<tag1><tag2><tag3>
<osx><keyboard><security><screen-lock>

How can I create a unique list of the tags, to look them up in the Tags table, instead of this hardcoded version:

SELECT * FROM Tags
  WHERE TagName = 'osx' 
     OR TagName = 'keyboard' 
     OR TagName = 'security'

Here is a interactive example.

Stackexchange uses T-SQL, my local copy is running under postgresql using Postgres app version 9.4.5.0.

Solution

Assuming this table definition:

CREATE TABLE posthistory(post_id int PRIMARY KEY, tags text);

Depending on what you want exactly:

To convert the string to an array, trim leading and trailing '<>', then treat '><' as separator:

SELECT *, string_to_array(trim(tags, '><'), '><') AS tag_arr
FROM   posthistory;

To get list of unique tags for whole table (I guess you want this):

SELECT DISTINCT tag
FROM   posthistory, unnest(string_to_array(trim(tags, '><'), '><')) tag;

The implicit LATERAL join requires Postgres 9.3 or later.

This should be substantially faster than using regular expressions. If you want to try regexp, use regexp_split_to_table() instead of regexp_split_to_array() followed by unnest() like suggested in another answer:

SELECT DISTINCT tag
FROM   posthistory, regexp_split_to_table(trim(tags, '><'), '><') tag;

Also with implicit LATERAL join. Related:

To search for particular tags:

SELECT *
FROM   posthistory
WHERE  tags LIKE '%<security>%'
AND    tags LIKE '%<osx>%';

SQL Fiddle.

Applied to your search in T-SQL in our data explorer:

SELECT TOP 100
       PostId, UserId, Text AS Tags FROM PostHistory
WHERE  year(CreationDate) = 2011
AND    PostHistoryTypeId IN (3  -- initial tags
                           , 6  -- edit tags
                           , 9) -- rollback tags
AND    Text LIKE ('%<' + ##TagName:String?postgresql## + '>%');

(T-SQL syntax uses the non-standard + instead of ||.)
https://data.stackexchange.com/apple/query/edit/417055