In the database we have email address dataset as following. Please notice that there are two observations for id 1003
data Email;
input id$ email $20.;
datalines;
1001 1001@gmail.com
1002 1002@gmail.com
1003 1003@gmail.com
1003 2003@gmail.com
;
run;
And we receive user request to change the email address as following,
data amendEmail;
input id$ email $20.;
datalines;
1003 1003@yahoo.com
;
run;
I attempt to using the update
statement in the data step
data newEmail;
update Email amendEmail;
by id;
run;
While it only change the first observation for id 1003.
My desired output would be 1001 1001@gmail.com 1002 1002@gmail.com 1003 1003@yahoo.com 1003 1003@yahoo.com
is it possible using non proc sql method?
If you want to change both rows, you will end up with duplicates. You should probably address the issue of duplicates in your source table to begin with.
If you need a working solution with duplicated results, consider using PROC SQL with LEFT JOIN and conditional clause for email address.
PROC SQL;
CREATE TABLE EGTASK.QUERY_FOR_EMAIL AS
SELECT t1.id,
/* email */
(CASE WHEN t1.id = t2.id THEN t2.email
ELSE t1.email
END) AS email
FROM WORK.EMAIL t1
LEFT JOIN WORK.AMENDEMAIL t2 ON (t1.id = t2.id);
QUIT;
As per comments, if you prefer to use data step, you can use the following:
data want (drop=email2);
merge Email amendEmail (rename=(email=email2));
by id;
if email2 ne "" then email=email2;
run;