Search code examples
sasleft-joininner-joinproc-sqlsas-studio

Left Join collapses data


I am working with some bonds data and I'm looking to left join the interest rate projections. my data set for the bonds date looks like:

data have;
input ID Vintage Reference_Rate Base2017;
Datalines;

1    2017    LIBOR_001M    0.01
1    2018    LIBOR_001M    0.01
1    2019    LIBOR_001M    0.01
1    2020    LIBOR_001M    0.01

2    2017    LIBOR_003M    0.012
2    2018    LIBOR_003M    0.012
2    2019    LIBOR_003M    0.012
2    2020    LIBOR_003M    0.012

3    2017    LIBOR_006M    0.014
3    2018    LIBOR_006M    0.014
3    2019    LIBOR_006M    0.014
3    2020    LIBOR_006M    0.014
;
run;

the second dataset which I am looking to left join (or even full join) looks like

data have2;
input Reference_rate Base2018 Base2019 Base2020;
datalines;

LIBOR_001M 0.011 0.012 0.013
LIBOR_003M 0.013 0.014 0.015
LIBOR_006M 0.015 0.017 0.019
;
run;

the dataset I've been getting collapses the vintage into 1 and messes up the rest of the analysis I've been running such that it looks like

data dontwant;
input ID Vintage Reference_rate Base2017 Base2018 Base2019 Base2020;
datalines;
      1    2017    LIBOR_001M    0.01    0.011    0.012    0.013
      2    2017    LIBOR_003M    0.012   0.013    0.014    0.015
      3    2017    LIBOR_006M    0.014   0.015    0.017    0,019
run;

the dataset I would like looks like this

data want;
input input Reference_rate Base2018 Base2019 Base2020;
datalines;

1    2017    LIBOR_001M    0.01    0.011    0.012    0.013
1    2018    LIBOR_001M    0.01    0.011    0.012    0.013
1    2019    LIBOR_001M    0.01    0.011    0.012    0.013
1    2020    LIBOR_001M    0.01    0.011    0.012    0.013

2    2017    LIBOR_003M    0.012   0.013    0.014    0.015
2    2018    LIBOR_003M    0.012   0,013    0.014    0.015
2    2019    LIBOR_003M    0.012   0.013    0.014    0.015
2    2020    LIBOR_003M    0.012   0.013    0.014    0.015

3    2017    LIBOR_006M    0.014   0.015   0.017    0.019
3    2018    LIBOR_006M    0.014   0.015   0.017    0.019
3    2019    LIBOR_006M    0.014   0.015   0.017    0.019
3    2020    LIBOR_006M    0.014   0.015   0.017    0.019
;
run;

the code I have been using is a pretty standard proc sql

PROC SQL;
CREATE TABLE want AS
SELECT a.*, b.*
FROM have A LEFT JOIN have2 B
ON A.reference_rate = B.reference_rate
ORDER BY reference_rate;
QUIT;

Solution

  • It's good practice to avoid using Select *, as it's better for the query performance and to avoid the case of having the same column name in both tables.

    I ran your same code and it worked fine, except for one warning because you are using select a.* & b.*; you have the field "Reference_Rate" in both tables.

    Solution:

    PROC SQL;
    CREATE TABLE want AS 
    SELECT 
        a.ID,
        a.Vintage,
        a.Reference_Rate,
        b.Base2018, 
        b.Base2019, 
        b.Base2020
    FROM have A LEFT JOIN have2 B
    ON A.reference_rate = B.reference_rate
    ORDER BY reference_rate;
    QUIT;
    

    Tip:

    You can print the SAS table values to the log using Put _ALL_ The code below will not create a table, it will only print the table to the log which is good for debugging small tables.

    data _null_;
    set want;
    put _all_;
    run;
    

    Log:

    ID=1 Vintage=2019 Reference_Rate=LIBOR_001M Base2018=0.011 Base2019=0.012 Base2020=0.013 _ERROR_=0 _N_=1
    ID=1 Vintage=2020 Reference_Rate=LIBOR_001M Base2018=0.011 Base2019=0.012 Base2020=0.013 _ERROR_=0 _N_=2
    ID=1 Vintage=2017 Reference_Rate=LIBOR_001M Base2018=0.011 Base2019=0.012 Base2020=0.013 _ERROR_=0 _N_=3
    ID=1 Vintage=2018 Reference_Rate=LIBOR_001M Base2018=0.011 Base2019=0.012 Base2020=0.013 _ERROR_=0 _N_=4
    ID=2 Vintage=2019 Reference_Rate=LIBOR_003M Base2018=0.013 Base2019=0.014 Base2020=0.015 _ERROR_=0 _N_=5
    ID=2 Vintage=2018 Reference_Rate=LIBOR_003M Base2018=0.013 Base2019=0.014 Base2020=0.015 _ERROR_=0 _N_=6
    ID=2 Vintage=2017 Reference_Rate=LIBOR_003M Base2018=0.013 Base2019=0.014 Base2020=0.015 _ERROR_=0 _N_=7
    ID=2 Vintage=2020 Reference_Rate=LIBOR_003M Base2018=0.013 Base2019=0.014 Base2020=0.015 _ERROR_=0 _N_=8
    ID=3 Vintage=2020 Reference_Rate=LIBOR_006M Base2018=0.015 Base2019=0.017 Base2020=0.019 _ERROR_=0 _N_=9
    ID=3 Vintage=2019 Reference_Rate=LIBOR_006M Base2018=0.015 Base2019=0.017 Base2020=0.019 _ERROR_=0 _N_=10
    ID=3 Vintage=2018 Reference_Rate=LIBOR_006M Base2018=0.015 Base2019=0.017 Base2020=0.019 _ERROR_=0 _N_=11
    ID=3 Vintage=2017 Reference_Rate=LIBOR_006M Base2018=0.015 Base2019=0.017 Base2020=0.019 _ERROR_=0 _N_=12