Search code examples
sasdate-formattingdatastep

How to convert date format from imported CSV to be able to merge data


I'm working on an assignment & not allowed to use PROC SQL. I imported data from two csv files & renamed columns at the same time (see code below) but when I tried to PROC APPEND the two tables, there's an issue with the date format. When I PROC CONTENTS I get the following info:

Work.2019data: Variable:date Type:Num Length:8 Format/Informat:MMDDYY10 When I open this file in notepad, the dates appears like this: 12/31/2019

For the second table:

Work.2020data: Variable:date Type:Num Length:8 Format/Informat:YYMMDD10 But when I open this file in notepad, the dates appears like this: 2020-11-16

PROC IMPORT
    DATAFILE= "&export_mtl/2019data.csv"
    OUT= WORK.2019data
    (RENAME=(new_cases=nouveaux_cas
             new_deaths=nouveaux_deces
             new_tests=nouveaux_tests
             total_tests=nb_total_tests
             female_smokers=femmes_fumeuses
             male_smokers=hommes_fumeurs
            ))
    DBMS= csv
    REPLACE;guessingrows=10000;
    GETNAMES= YES;
RUN;

PROC IMPORT
    DATAFILE= "&export_mtl/2020data.csv"
    OUT= WORK.2020data
(RENAME=(new_cases=nouveaux_cas
             new_deaths=nouveaux_deces
             new_tests=nouveaux_tests
             total_tests=nb_total_tests
             female_smokers=femmes_fumeuses
             male_smokers=hommes_fumeurs
            ))
    DBMS= csv
    REPLACE;guessingrows=10000;
    GETNAMES= YES;
RUN;

What's the simplest way to cast the date in the 2020data table so that I can concatenate the two tables after? I've seen so many ways of doing this & tried them with no luck.


Solution

  • If you combine two datasets that have the same variable then the default format attached to the variable will be first non missing format that the data step compiler sees. So for your example if you run

    data want;
       set '2019data'n '2020data'n ;
    run;
    

    then the MMDDYY10. format will be used. But if you reverse the order that the input datasets are referenced

    data want;
       set '2020data'n '2019data'n ;
    run;
    

    then the YYMMDD10. format will be used.

    But if you add a FORMAT statement to the data step then you can control what format will be attached in the new dataset. So if you used:

    data want;
       set '2020data'n '2019data'n ;
       format date date9.;
    run;
    

    Now the default style that will be used to display the values of DATE in WANT will be ddMONyyyy.