Search code examples
datesasinformat

date and time strings cannot be read correctly in SAS


I have the following sample data to read into SAS

2012-05-0317:36:00NYA
2012-05-0410:29:00SNW
2012-05-2418:45:00NYA
2012-05-2922:24:00NSL
2012-05-3107:26:00DEN
2012-05-2606:10:00PHX
2012-05-0202:30:00FTW
2012-05-0220:45:00HOB
2012-05-0103:01:00HGR
2012-05-0120:30:00RCH
2012-05-1112:00:00NAS

However, there is a strange problem bothering me. Here is my first try.

data test;
    informat DT yymmdd10.
        TM $TIME8.
        orig $3.
    ;
    format DT yymmddd10.
        TM TIME8.
        orig $3.
    ;
    input
        @1 DT_temp 
        @11 TM_temp 
        @19 orig 
    ;
    datalines;
2012-05-0317:36:00NYA
2012-05-0410:29:00SNW
2012-05-2418:45:00NYA
2012-05-2922:24:00NSL
2012-05-3107:26:00DEN
2012-05-2606:10:00PHX
2012-05-0202:30:00FTW
2012-05-0220:45:00HOB
2012-05-0103:01:00HGR
2012-05-0120:30:00RCH
2012-05-1112:00:00NAS
run;

The result shows

DT TM orig
.   .   NYA
.   .   SNW
.   .   NYA
.   .   NSL
.   .   DEN
.   .   PHX
.   .   FTW
.   .   HOB
.   .   HGR
.   .   RCH
.   .   NAS

This means the date and time are not read correctly. A work around I have right now is to read everything as string first and then convert it to date and time respectively.

data test;
    informat DT_temp $10.
        TM_temp $8.
        orig $3.
    ;
    format DT yymmddd10.
        TM TIME8.
        orig $3.
    ;
    input
        @1 DT_temp 
        @11 TM_temp 
        @19 orig 
    ;
    DT=input(strip(DT_temp),yymmdd10.);
    TM=input(strip(TM_temp),time8.);

    drop DT_temp TM_temp;
    datalines;
2012-05-0317:36:00NYA
2012-05-0410:29:00SNW
2012-05-2418:45:00NYA
2012-05-2922:24:00NSL
2012-05-3107:26:00DEN
2012-05-2606:10:00PHX
2012-05-0202:30:00FTW
2012-05-0220:45:00HOB
2012-05-0103:01:00HGR
2012-05-0120:30:00RCH
2012-05-1112:00:00NAS
run;

In this way, everything gets the correct format.

orig DT TM
NYA 2012-05-03  17:36:00
SNW 2012-05-04  10:29:00
NYA 2012-05-24  18:45:00
NSL 2012-05-29  22:24:00
DEN 2012-05-31  7:26:00
PHX 2012-05-26  6:10:00
FTW 2012-05-02  2:30:00
HOB 2012-05-02  20:45:00
HGR 2012-05-01  3:01:00
RCH 2012-05-01  20:30:00
NAS 2012-05-11  12:00:00

Basically, these two methods used the same informat. I was wondering why the first method does not work. Appreciate for any kind of help. Thank you very much.


Solution

  • Your "first try" code has a couple errors, but I'm guessing they were introduced while writing the question.

    Because you are using column-oriented input, you need to specify the format to be used for each variable. Here is a corrected version:

    data test;
        informat DT yymmdd10.
            TM TIME8.
            orig $3.
        ;
        format DT yymmddd10.
            TM TIME8.
            orig $3.
        ;
        input
            @1 DT  yymmdd10.
           @11 TM  TIME8.
           @19 orig $3.
        ;
        datalines;
    2012-05-0317:36:00NYA
    2012-05-0410:29:00SNW
    2012-05-2418:45:00NYA
    2012-05-2922:24:00NSL
    2012-05-3107:26:00DEN
    2012-05-2606:10:00PHX
    2012-05-0202:30:00FTW
    2012-05-0220:45:00HOB
    2012-05-0103:01:00HGR
    2012-05-0120:30:00RCH
    2012-05-1112:00:00NAS
    run;