Search code examples
sqlstringcsvsplitvertica

SQL function to generate new row per event?


I've looked for an answer here to no avail; wondering if this problem is best suited outside SQL environment but thought I'd see how it could be solved.

I'm trying to look at web journeys and as a result need my data to be in the following format:

ID      DATE       EVENT
1       01/01/20   "Landing Page"
1       01/01/20   "purchase page"
2...etc

At present the data is in the following format:

ID       DATE       EVENT
1        01/01/20   "Landing page","purchase page"
2... etc

I essentially want to break out the event field by comma and create a new row for each comma separated event with all other fields duplicated. My SQL isn't great, have tried numerous temp tables, unions and split_parts to break out the field to some success but cannot get it into its own row.


Solution

  • Indeed it works as @GMB suggested.

    Only that the function StringTokenizerDelim() is not in the default search path.

    You'll have to qualify it explicitly with the v_txtindex schema in which it resides:

    WITH
    -- your input
    indata(id,dt,ev) AS (
              SELECT 1, DATE '2020-01-01','landing page,purchase page'
    UNION ALL SELECT 2, DATE '2020-01-02','landingpage,browse-article page,purchase page'
    )
    ,
    tokens AS (
      SELECT
        id
      , dt
      , v_txtindex.StringTokenizerDelim(ev,',') OVER(PARTITION BY id,dt)
      FROM indata
    )
    SELECT
      id
    , dt
    , words AS event
    FROM tokens;
    -- out  id |     dt     |        event        
    -- out ----+------------+---------------------
    -- out   1 | 2020-01-01 | landing page
    -- out   1 | 2020-01-01 | purchase page
    -- out   2 | 2020-01-02 | landingpage
    -- out   2 | 2020-01-02 | browse-article page
    -- out   2 | 2020-01-02 | purchase page