I have this data check integrity code for an oncology study I'm working on. This is ostensibly to confirm TU,TR and RS are consistent.
proc sql ;
create table tu_tr_rs as
select tu.*,tr.*,rs.*
from trans.tu as tu
left join trans.tr as tr on tu.usubjid=tr.usubjid and tu.TULNKID
=tr.TRLNKID and tu.tudtc=tr.trdtc
left join trans.rs as rs on tr.usubjid=rs.usubjid and tr.trdtc=
rs.rsdtc
;
quit;
However, when I run this code I get the warning
"Variable XXXX already exists on file WORK.TU_TR_RS."
When I add the feedback option to PROC SQL to get a more granular look I get this
So I know if it's one variable that brings this warning up you can use a rename/DROP combination to work around it but for this case is it just the case that I have to explicitly state the variables for each dataset in the select statement or is there something fundamentally wrong with the code?
Yes, if you want to select columns with the same name from 2 (or more) data sets, you simply need to select them explicitly and give them distinct names. Something like this:
create table tu_tr_rs as
select
tu.ColA as tu_ColA
,tu.ColB as tu_ColB
/* etc */
,tr.ColA as tr_ColA
,tr.ColB as tr_ColB
/* etc */
,rs.ColA as rs_ColA
,rs.ColB as rs_ColB
/* etc */
from trans.tu as tu
/* etc */