I'm struggling to see my way out of a logic problem with string concatenation on Redshift because of two limitations:
||
operator cause the whole string to be Null (standard SQL behaviour)concat()
function at a timeI'm building a Stored Procedure that takes in some parameters and does some simple logic to insert data into a table by concatenating lots of variables to build an INSERT statement.
Consider the following example:
create table someTable(
someString varchar NOT NULL,
anotherString varchar,
someDate datetime NOT NULL,
aNullableDate datetime
)
diststyle even;
Depending on the incoming parameters, the logic of the Stored Procedure dictates that some of these values should be Null in some cases, specifically one of the date fields:
create or replace procedure writeSomeData(varchar,datetime)
language plpgsql
AS $$
declare
v_inputString ALIAS for $1;
v_inputDate ALIAS for $2;
v_anotherString varchar;
v_nullableDate datetime;
v_sql varchar;
BEGIN
-- Do a bunch of logic
if v_inputString = 'foo' then
v_anotherString := null;
else
v_anotherString := 'bar';
end if;
if v_inputDate <= '2000-01-01' then
v_nullableDate := null;
else
v_nullableDate := getDate();
end if;
-- Build an INSERT statement using all the appropriate variables
v_sql := 'INSERT INTO SOMETABLE (someString,anotherString,someDate,aNullableDate)
VALUES (''' || v_inputString || ''' , ''' || v_anotherString
|| ''' , ''' || v_inputDate || ''' , ''' || v_nullableDate || ''');';
raise notice 'Value of v_sql: %', v_sql;
execute v_sql;
END;
$$
The problem is, of course, that because two of the variables in the series could be null, v_sql
will often resolve to NULL and cause the execute v_sql
to fail. To be clear, NULL is a valid value for these columns, but for string values it'd be feasible to use COALESCE
and just capture empty strings (''
).
However, COALESCE
doesn't help in this particular case because Redshift doesn't accept ''
as a Datetime value, and we can't use something like CASE
to pass NULL because it nullifies the whole variable...
The answer in any other PostgreSQL context would be to use CONCAT()
, where one can join as many strings as necessary together, and Null values are ignored correctly. However, because Redshift only allows two (?!) strings to be joined at a time, I'm worried my only solution is the brute force method:
v_sql varchar;
v_string_1 varchar;
v_string_2 varchar;
v_string_3 varchar
-- ...
v_string_n varchar;
v_sql := concat(v_string_1, v_string_2);
v_sql := concat(v_sql, v_string_3);
-----
v_sql := concat(v_sql, v_string_n);
If it were just four strings to join it'd be stupid, but fine. However, I have many variables going into this INSERT statement - am I doomed? Or is there a better way to go about this problem?
You're not doomed but there is more going on here. You want to build a SQL statement that will insert the desired values, including NULL. You are using a stored procedure. The resulting statement needs to be valid for the data types of the target table. You are running into issues when you want to insert NULL.
The problem with your proposed solution is that you will need the keyword 'NULL' in the insert statement to insert a NULL. You can't just have back to back commas.
You will want to build the strings needed for the INSERT statement which may not the same values as the data being inserted. Specifically in the case of NULL. So when you want to insert a date the string needs to be '2023-08-28' (with quotes) but when you want NULL the string is just NULL (no quotes). For example you code snippet:
if v_inputDate <= '2000-01-01' then
v_nullableDate := null;
else
v_nullableDate := getDate();
end if;
would become:
if v_inputDate <= '\'2000-01-01\'' then
v_nullableDate := 'null';
else
v_nullableDate := '\'' || getDate()::text || '\'';
end if;
Then your INSERT statement would just need to build the INSERT statement text. Like:
v_sql := 'INSERT INTO SOMETABLE (someString,anotherString,someDate,aNullableDate)
VALUES (' || v_inputString || ' , ' || v_anotherString
|| ' , ' || v_inputDate || ' , ' || v_nullableDate || ');';
Because we've taken the surrounding quotes out of the INSERT concatenation, we need to add them back in somewhere. You could change the call writeSomeData()
statement to add extra quotes on each parameter, but that would enforce the peculiarity of the proc onto the consumer. And because ALIAS for $1
creates a Constant, we will need to add a new varchar to include quotes for each incoming parameter:
v_inputString ALIAS for $1;
v_inputDate ALIAS for $2;
...
v_quotedInputString = '\'' || v_inputString || '\'';
v_quotedInputDate = '\'' || v_inputDate::text || '\'';
Then you just change your INSERT statement where the incoming parameters were to use their 'quoted' values:
v_sql := 'INSERT INTO SOMETABLE (someString,anotherString,someDate,aNullableDate)
VALUES (' || v_quotedInputString || ' , ' || v_anotherString
|| ' , ' || v_quotedInputDate || ' , ' || v_nullableDate || ');';
You may still need to handle your date parameters if they can be submitted as NULL (ie. for inputDate in the example), because NULL::text is still NULL, and the ||
operator will cause v_quotedInputDate
to be NULL instead of 'null'.