Search code examples
mysqldatabase-designsasproc-sqldimensional-modeling

SAS/PROC-SQL Convert from table with unique key to table with multiple rows has the same key


Currently I have a table like below:

U_ID SPOUSEDOB   FCHILDDOB   SCHILDDOB   ChangeDate
1    20/01/1980  01/01/1900  01/01/1900  01/01/2000
2    20/01/1950  20/01/1970  01/01/1900  01/01/2000
3    20/01/1960  20/01/1990  20/01/1995  01/01/2000
1    20/01/1980  20/01/1995  01/01/1900  01/01/2005
1    20/01/1980  20/01/1995  20/01/2006  01/01/2010

The date 01/01/1900 which means there is no spouse/child. I want to convert this table like below:

Member_ID  U_ID  Relation DOB         ChangeDate
1          1     Spouse   20/01/1980  01/01/2000
2          2     Spouse   20/01/1950  01/01/2000
3          2     Child    20/01/1970  01/01/2000
4          3     Spouse   20/01/1960  01/01/2000
5          3     Child    20/01/1990  01/01/2000
6          3     Child    20/01/1995  01/01/2000
7          1     Child    20/01/1995  01/01/2005
8          1     Child    20/01/2006  01/01/2010

But this table still could not provide the best way to answer this question at the specific time (01/01/2006) and (01/01/2011), user 1 had how many children? The answer would be 1 and 2. And I also find it difficult to convert from the table 1 to table 2, I'm stuck at how to create new row for the same user_id. Any idea on how to improve this situation or resolve the problem at converting table? Help is really appreciated. Thank you in advance.


Solution

  • Here's a simple SAS datastep. You can adjust it to use VNAME() to define relation (depending on how your other variables are named); for example,

    relation = vname(DOBs[_t]);
    

    Then use SUBSTR or whatever to shorten it to the proper text. Other than that it should be sufficiently flexible to handle any number of relations in the initial HAVE dataset.

    data want;
    set have;
    array DOBs SPOUSEDOB   FCHILDDOB   SCHILDDOB;
    do _t = 1 to dim(DOBs);
      if DOBs[_t] ne '01JAN1900'd then do;
        relation=ifc(_t=1,'Spouse','Child'); *this could also be done using VNAME() to be more flexible;
        DOB=DOBs[_t];
        output;
      end;
    end;
    keep relation DOB ChangeDate U_ID;
    format DOB Changedate Date9.;
    run;
    
    proc sort data=want;
    by u_id descending relation dob changedate;
    run;
    
    
    data final;
    set want;
    by u_id descending relation dob changedate;
    if first.dob;
    run;
    

    Then to process it to select only people born as of a certain date you can use the query fthiella posted if you prefer to use SQL, or you can filter in a SAS proc, like:

    proc means data=final;
    where dob le '01JAN2006'd;
    class relation;
    var (whatever);
    run;
    

    Or use ChangeDate if that is what you want to filter on rather than actual DOB.