Search code examples
sqlsasscd

How to join a fact table to a (Kimball type 2?) slowly changing date in SAS


New to SQL - I'd like to join the fact table crselist to the crseinfo table to get the correct dimension info. I've been working on some correlated subqueries but none give the desired result (below). The crseinfo table says that beginning in 199610 Art 508 belongs to college 09 and should be called OkArt..which updates in 200220 and 200300. Crselist lists the courses actually taught.

data crseinfo ; 
input crsenme $ crsenum crsefx crsecollege $ crsedesc $9.;
cards;
ART 508 199610 09 OkArt
ART 508 200220 18 WowItsArt
ART 508 200300 18 SuperArt
;
run;

data crselist; 
input  crsenme $ crsenum term section $; 
cards;
ART 508 199610 01
ART 508 199610 02
ART 508 199610 03
ART 508 199710 01
ART 508 200220 01
ART 508 200220 02
ART 508 201020 01
ART 508 201120 01
;
run;

The desired result would then be:

data desired ; 
input  crsenme $ crsenum term section $ crsecollege $ crsedesc $9.;
cards;
ART 508 199610 01 09 OkArt
ART 508 199610 02 09 OkArt
ART 508 199610 03 09 OkArt
ART 508 199710 01 09 OkArt
ART 508 200220 01 18 WowItsArt
ART 508 200220 02 18 WowItsArt
ART 508 201020 01 18 SuperArt
ART 508 201120 01 18 SuperArt
;

Referring to the SAS help page (http://web.utk.edu/sas/OnlineTutor/1.2/en/60477/m70/m70_52.htm) it would seem like I could do something like:

proc sql ; 
select * 
from crseinfo a, crselist b
where a.crsenme eq b.crsenme and 
  a.crsenum eq b.crsenum and 
  b.term eq (select min(c.term) 
   from crselist c 
   where c.term ge a.crsefx )
   ;
quit;

But this does not work. I am interested in a SQL-based solution - Thank-you for your time.


Solution

  • You're nearly there. Rather than using a correlated subquery, I think it's simpler to do this using a combination of having and group by clauses:

    proc sql noprint _method;
        create table desired2 as
            select a.*, b.crsecollege, b.crsedesc
                from  crselist a left join crseinfo b
                    on a.crsenme = b.crsenme and a.crsenum = b.crsenum
                        where a.term ge b.crsefx
                            group by a.crsenme, a.crsenum, a.term
                                having b.crsefx = max(b.crsefx)
    ;
    quit;
    

    Slightly simpler version:

    proc sql noprint _method;
        create table desired3 as
            select a.*, b.crsecollege, b.crsedesc
                from  crselist a, crseinfo b
                    where       a.crsenme = b.crsenme 
                                and a.crsenum = b.crsenum 
                                and a.term ge b.crsefx
                        group by a.crsenme, a.crsenum, a.term, a.section
                            having b.crsefx = max(b.crsefx)
    ;
    quit;
    

    These produce the same result, but the latter one is more readily optimised into a hash join.