Search code examples
sqlstored-proceduresamazon-redshiftstring-concatenation

Redshift ProcSQL: Is there a better way to concatenate many variables, including possibly-null dates, into a single variable in Redshift?


I'm struggling to see my way out of a logic problem with string concatenation on Redshift because of two limitations:

  1. Any Null values in a set of strings concatenated with the || operator cause the whole string to be Null (standard SQL behaviour)
  2. Redshift only allows two strings to be joined with the concat() function at a time

I'm building a Stored Procedure that takes in some parameters and does some simple logic to insert data into a table by concatenating lots of variables to build an INSERT statement.

Consider the following example:

create table someTable(
    someString varchar NOT NULL,
    anotherString varchar,
    someDate datetime NOT NULL,
    aNullableDate datetime
)
diststyle even;

Depending on the incoming parameters, the logic of the Stored Procedure dictates that some of these values should be Null in some cases, specifically one of the date fields:

create or replace procedure writeSomeData(varchar,datetime)
    language plpgsql
AS $$
    declare
        v_inputString ALIAS for $1;
        v_inputDate ALIAS for $2;

        v_anotherString varchar;
        v_nullableDate datetime;

        v_sql varchar;
BEGIN
-- Do a bunch of logic
    if v_inputString = 'foo' then
       v_anotherString := null;
    else
       v_anotherString := 'bar';
    end if;

    if v_inputDate <= '2000-01-01' then
       v_nullableDate := null;
    else
       v_nullableDate := getDate();
    end if;

-- Build an INSERT statement using all the appropriate variables
    v_sql := 'INSERT INTO SOMETABLE (someString,anotherString,someDate,aNullableDate) 
             VALUES (''' || v_inputString || ''' , ''' || v_anotherString 
             || ''' , ''' || v_inputDate || ''' , ''' || v_nullableDate || ''');';

   raise notice 'Value of v_sql: %', v_sql;
   execute v_sql;
   
END;
$$

The problem is, of course, that because two of the variables in the series could be null, v_sql will often resolve to NULL and cause the execute v_sql to fail. To be clear, NULL is a valid value for these columns, but for string values it'd be feasible to use COALESCE and just capture empty strings (''). However, COALESCE doesn't help in this particular case because Redshift doesn't accept '' as a Datetime value, and we can't use something like CASE to pass NULL because it nullifies the whole variable...

The answer in any other PostgreSQL context would be to use CONCAT(), where one can join as many strings as necessary together, and Null values are ignored correctly. However, because Redshift only allows two (?!) strings to be joined at a time, I'm worried my only solution is the brute force method:

v_sql varchar;
v_string_1 varchar;
v_string_2 varchar;
v_string_3 varchar
--      ...
v_string_n varchar;

v_sql := concat(v_string_1, v_string_2);
v_sql := concat(v_sql, v_string_3);
-----
v_sql := concat(v_sql, v_string_n);

If it were just four strings to join it'd be stupid, but fine. However, I have many variables going into this INSERT statement - am I doomed? Or is there a better way to go about this problem?


Solution

  • You're not doomed but there is more going on here. You want to build a SQL statement that will insert the desired values, including NULL. You are using a stored procedure. The resulting statement needs to be valid for the data types of the target table. You are running into issues when you want to insert NULL.

    The problem with your proposed solution is that you will need the keyword 'NULL' in the insert statement to insert a NULL. You can't just have back to back commas.

    You will want to build the strings needed for the INSERT statement which may not the same values as the data being inserted. Specifically in the case of NULL. So when you want to insert a date the string needs to be '2023-08-28' (with quotes) but when you want NULL the string is just NULL (no quotes). For example you code snippet:

    if v_inputDate <= '2000-01-01' then
       v_nullableDate := null;
    else
       v_nullableDate := getDate();
    end if;
    

    would become:

    if v_inputDate <= '\'2000-01-01\'' then
       v_nullableDate := 'null'; 
    else
       v_nullableDate := '\'' || getDate()::text || '\'';
    end if;
    

    Then your INSERT statement would just need to build the INSERT statement text. Like:

    v_sql := 'INSERT INTO SOMETABLE (someString,anotherString,someDate,aNullableDate) 
             VALUES (' || v_inputString || ' , ' || v_anotherString 
             || ' , ' || v_inputDate || ' , ' || v_nullableDate || ');';
    

    Because we've taken the surrounding quotes out of the INSERT concatenation, we need to add them back in somewhere. You could change the call writeSomeData() statement to add extra quotes on each parameter, but that would enforce the peculiarity of the proc onto the consumer. And because ALIAS for $1 creates a Constant, we will need to add a new varchar to include quotes for each incoming parameter:

    v_inputString ALIAS for $1;
    v_inputDate ALIAS for $2;
    ...
    v_quotedInputString = '\'' || v_inputString || '\'';
    v_quotedInputDate = '\'' || v_inputDate::text || '\'';
    

    Then you just change your INSERT statement where the incoming parameters were to use their 'quoted' values:

    v_sql := 'INSERT INTO SOMETABLE (someString,anotherString,someDate,aNullableDate) 
             VALUES (' || v_quotedInputString || ' , ' || v_anotherString 
             || ' , ' || v_quotedInputDate || ' , ' || v_nullableDate || ');';
    

    You may still need to handle your date parameters if they can be submitted as NULL (ie. for inputDate in the example), because NULL::text is still NULL, and the || operator will cause v_quotedInputDate to be NULL instead of 'null'.