New to SQL - I'd like to join the fact table crselist to the crseinfo table to get the correct dimension info. I've been working on some correlated subqueries but none give the desired result (below). The crseinfo table says that beginning in 199610 Art 508 belongs to college 09 and should be called OkArt..which updates in 200220 and 200300. Crselist lists the courses actually taught.
data crseinfo ;
input crsenme $ crsenum crsefx crsecollege $ crsedesc $9.;
cards;
ART 508 199610 09 OkArt
ART 508 200220 18 WowItsArt
ART 508 200300 18 SuperArt
;
run;
data crselist;
input crsenme $ crsenum term section $;
cards;
ART 508 199610 01
ART 508 199610 02
ART 508 199610 03
ART 508 199710 01
ART 508 200220 01
ART 508 200220 02
ART 508 201020 01
ART 508 201120 01
;
run;
The desired result would then be:
data desired ;
input crsenme $ crsenum term section $ crsecollege $ crsedesc $9.;
cards;
ART 508 199610 01 09 OkArt
ART 508 199610 02 09 OkArt
ART 508 199610 03 09 OkArt
ART 508 199710 01 09 OkArt
ART 508 200220 01 18 WowItsArt
ART 508 200220 02 18 WowItsArt
ART 508 201020 01 18 SuperArt
ART 508 201120 01 18 SuperArt
;
Referring to the SAS help page (http://web.utk.edu/sas/OnlineTutor/1.2/en/60477/m70/m70_52.htm) it would seem like I could do something like:
proc sql ;
select *
from crseinfo a, crselist b
where a.crsenme eq b.crsenme and
a.crsenum eq b.crsenum and
b.term eq (select min(c.term)
from crselist c
where c.term ge a.crsefx )
;
quit;
But this does not work. I am interested in a SQL-based solution - Thank-you for your time.
You're nearly there. Rather than using a correlated subquery, I think it's simpler to do this using a combination of having
and group by
clauses:
proc sql noprint _method;
create table desired2 as
select a.*, b.crsecollege, b.crsedesc
from crselist a left join crseinfo b
on a.crsenme = b.crsenme and a.crsenum = b.crsenum
where a.term ge b.crsefx
group by a.crsenme, a.crsenum, a.term
having b.crsefx = max(b.crsefx)
;
quit;
Slightly simpler version:
proc sql noprint _method;
create table desired3 as
select a.*, b.crsecollege, b.crsedesc
from crselist a, crseinfo b
where a.crsenme = b.crsenme
and a.crsenum = b.crsenum
and a.term ge b.crsefx
group by a.crsenme, a.crsenum, a.term, a.section
having b.crsefx = max(b.crsefx)
;
quit;
These produce the same result, but the latter one is more readily optimised into a hash join.