Search code examples
sasdatastep

update statement in data step


In the database we have email address dataset as following. Please notice that there are two observations for id 1003

data Email;
    input id$ email $20.;
    datalines;
1001 1001@gmail.com
1002 1002@gmail.com
1003 1003@gmail.com
1003 2003@gmail.com
;
run;

And we receive user request to change the email address as following,

data amendEmail;
    input id$ email $20.;
    datalines;
1003 1003@yahoo.com
;
run;

I attempt to using the update statement in the data step

data newEmail;
    update Email amendEmail;
    by id;
run;

While it only change the first observation for id 1003.

My desired output would be 1001 1001@gmail.com 1002 1002@gmail.com 1003 1003@yahoo.com 1003 1003@yahoo.com

is it possible using non proc sql method?


Solution

  • If you want to change both rows, you will end up with duplicates. You should probably address the issue of duplicates in your source table to begin with.

    If you need a working solution with duplicated results, consider using PROC SQL with LEFT JOIN and conditional clause for email address.

    PROC SQL;
        CREATE TABLE EGTASK.QUERY_FOR_EMAIL AS 
            SELECT t1.id, 
               /* email */
                (CASE WHEN t1.id = t2.id THEN t2.email 
                ELSE t1.email 
                END) AS email 
            FROM WORK.EMAIL t1 
            LEFT JOIN WORK.AMENDEMAIL t2 ON (t1.id = t2.id);
    QUIT;
    

    As per comments, if you prefer to use data step, you can use the following:

    data want (drop=email2);
      merge Email amendEmail (rename=(email=email2));
      by id;
      if email2 ne "" then email=email2; 
    run;