Search code examples
amazon-redshiftetlmatillion

Create dynamic string for IN clause


We are using Matillion ETL API to pass query parameters to the underlying Redshift query. A query parameter variable for States is created using concatenation from the dropdown list and passed in the API as follows:

v_state=" & #encodeForURL("''AL'',''CA''")#

This variable is expected to be passed as and state in ('AL','CA') in the query but due to the presence of , it throws an error for using reserved character. I have tried escaping this character but it still doesn't work.

So instead of , I used | in the variable v_state=" & #encodeForURL("''AL''|''CA''")# and it does run the API but the underlying query doesn't return any data.

and state in ('AL','CA') (Returns data)

and state in (replace('''AL''|''CA''','|',',')) (Doesn’t return data)

The questions would be:

  1. In the Matillion API, is there a way to escape comma in the string?
  2. What's the correct way to use pipe delimited string for the IN clause in Redshift?

Edit 1: Found a way to convert state string into rows.

    with t as
        (select replace('AL|CO|CA|MN', '|', ',') as state)
   , seq_0_9 as (
    select 0 as num
    union all
    select 1 as num
    union all
    select 2 as num
    union all
    select 3 as num
    union all
    select 4 as num
    union all
    select 5 as num
    union all
    select 6 as num
    union all
    select 7 as num
    union all
    select 8 as num
    union all
    select 9 as num
)
   , seq_0_99 as (
    select a.num + b.num * 10 as num
    from seq_0_9 a,
         seq_0_9 b
)
SELECT split_part(t.state, ',', num) AS state
FROM t
         JOIN seq_0_99 seq
              ON num <= regexp_count(t.state, ',') + 1
WHERE num > 0;

Solution

  • Thanks for confirming. I am not sure how you hit a character escaping problem with variables in the Matillion REST API, because the variables are passed in via the POST body?

    In any case, here's something that worked for me:

    Create an Orchestration Job named EntryPointJob with Job Variables set like below.

    enter image description here

    Where

    • jv_commasep is the public input, and is expected to contain something like AL,CA
    • pjv_inlist is a private variable which gets updated to contain the same information in SQL compatible format like 'AL','CA'

    I used a Python3 script to derive pjv_inlist from jv_commasep with this code:

    context.updateVariable('pjv_inlist', ','.join(["'" + x.strip() + "'" for x in jv_commasep.split(',')]))
    
    

    Now, in the Matillion job you can use the pjv_inlist variable in SQL, for example like this: ... WHERE "state" IN (${pjv_inlist})

    To run the job and pass a scalar variables you first need to create a JSON file like below. I named mine RunVariablesContainer.json

    {
      "scalarVariables" : {
        "jv_commasep" : "AL,CA"
      }
    }
    

    Then you can call Matillion's own REST API to run the Matillion job (with parameters) like this:

    curl -k -X POST -u un:pw -H "Content-type: application/json" "https://.../rest/v1/group/name/.../project/name/.../version/name/.../job/name/EntryPointJob/run?environmentName=Demo" --data-binary @RunVariablesContainer.json