Search code examples
stored-proceduresamazon-redshiftplpgsqldatabase-cursor

An example of a Cursor & UPDATE statement in conjunction inside a procedure on Redshift


Would you please provide an an example for a Redshift procedure where you have used a cursor and an UPDATE statement in conjunction? Is that even feasible, I couldn't find an example. I'm looking for a simple template code to learn how to have these 2 together in a single procedure on Redshift.

Here is an example use case:

I have a table like this:

CREATE TABLE test_tbl 
  (
   Contactid VARCHAR(500), 
   sfdc_OppId_01 VARCHAR(500),
   sfdc_OppId_02 VARCHAR(500),
   sfdc_OppId_03 VARCHAR(500),
   sfdc_OppId_04 VARCHAR(500),
   sfdc_OppId_05 VARCHAR(500),
   sfdc_OppId_06 VARCHAR(500)       
   )

I want to update each sfdc_OppId_xx with the relative value from another table; sfdc_tbl. Here is what sfdc_tbl looks like:

sfdc_contactId sfdc_Opp_Id
AA123hgt 999999
AA123hgt 888888
AA123hgt 777777
AA123hgt 432567
AA123hgt 098765
AA123hgt 112789

So as you see, there are duplicate sfdc_contactid in the sfdc_tbl. My final goal is to list all the sfdc_Opp_Id for given contactid horizontally in the test_tbl. I shall not have duplicate contactid in the test_tbl.

  INSERT INTO test_tbl (Contactid) 
  SELECT sfdc_contactId
  FROM sfdc_tbl
  GROUP BY sfdc_contactId

And here is what I'm trying to do:

CREATE OR REPLACE PROCEDURE testing_procedure (results INOUT refcursor)
AS 
$$ 
BEGIN 
       OPEN cursor_testing FOR
           SELECT 
                  Ops.sfdc_Opp.id, 
                  ROW_NUMBER () OVER(PARTITION BY Ops.sfdc_contactId ORDER BY sfdc_Opp_Id ) RWN

            FROM sfdc_tbl Ops
            INNER JOIN test_tbl tbl 
            ON Ops.sfdc_contactId = tbl.contactid; 

   UPDATE test_tbl
     SET sfdc_Opp_01 =  CASE WHEN cursor_testing.RWN = 1 THEN cursor_testing.sfdc_Ops_id ELSE NULL END,
         sfdc_Opp_02 =  CASE WHEN cursor_testing.RWN = 2 THEN cursor_testing.sfdc_Ops_id ELSE NULL END,
         sfdc_Opp_03 =  CASE WHEN cursor_testing.RWN = 3 THEN cursor_testing.sfdc_Ops_id ELSE NULL END,
         sfdc_Opp_04 =  CASE WHEN cursor_testing.RWN = 4 THEN cursor_testing.sfdc_Ops_id ELSE NULL END,
         sfdc_Opp_05 =  CASE WHEN cursor_testing.RWN = 5 THEN cursor_testing.sfdc_Ops_id ELSE NULL END,
         sfdc_Opp_06 =  CASE WHEN cursor_testing.RWN = 6 THEN cursor_testing.sfdc_Ops_id ELSE NULL END
         ;
                         

END; 
$$ 
LANGUAGE plpgsql; 

I keep getting an error

incorrect syntax at or near "cursor_testing"


Solution

  • I've answered a question with a similar solution. The SQL uses a cursor's data to INSERT into a table and this same path should work for UPDATE - How to join System tables or Information Schema tables with User defined tables in Redshift

    That being said and looking at your code I really think you would be better off using a temp table rather than a cursor. The first thing to note is that a cursor is not a table. Your use pattern won't work AFAIK. You read a cursor row by row (or bunches) which is contrary to Redshift's columnar table storage. So you will need to loop on the rows from the cursor and perform N updates. This will be extremely slow! You would be querying columnar data, storing the results in a cursor as rows, reading these row one by one, and then performing a new query (UPDATE) for each row. Ick! Stay in "columnar land" and use a temp table.