Search code examples
importsasprocsas-studio

Import Error even when variable is dropped SAS


I'm importing a semi-colon delimited file as such

ID    Segment    Number    Date     Payment
1     A1         103RTR    10OCT17  10
2     A1         205FCD    11OCT17  11
...

the SAS doesn't like the mixture of numbers and characters when I import this txt file using this code:

proc import
out=want (drop=Number)
datafile="have"
dbms=dlm
replace;
delimiter=';';
options validvarname=v7 missing='';
run;

Even though i'm not trying to load in Number, which in the real dataset is much longer, like 12 numbers followed by four characters, it returns this error in the log

NOTE: Invalid data for Number in line 22157 21-30.
 WARNING: Limit set by ERRORS= option reached.  Further errors of this type will not be printed.
 ERROR: Import unsuccessful.  See SAS Log for details.

I would like to do a typical infile and informat but with having 32 variables and 2 million rows, I just cannit be taking the time to find out what range and style each variable needs to be read in. so I am asking whether there's a way to format that particular variable but sticking with the ease of proc import.

But I'm also asking whether this actually impacts my import? as the data seems fine when checking the output.


Solution

  • I would like to do a typical infile and informat but with having 32 variables and 2 million rows, I just cannit be taking the time to find out what range and style each variable needs to be read in. so I am asking whether there's a way to format that particular variable but sticking with the ease of proc import.

    Bad idea, garbage in = garbage out and you're only dealing with 32 variables so that's actually not that bad. Take the time to clean and import the data correctly pays off and you learn about the data in the process which speeds up further analysis. This step is not a waste of time. After importing a data set, its a good idea to run a PROC MEANS and PROC FREQ and review the output to ensure it was read correctly.

    proc means data=have;
    run;
    
    proc freq data=have;
    run;
    

    Set GUESSINGROWS=MAX in the PROC IMPORT. This forces SAS to scan the whole file before importing it, which will then be more likely correct. If you're automating this process and reading the file more than once, then take the code from the log and use that instead of PROC IMPORT, once you've verified the data.

    And the option statement should not be within the PROC IMPORT step, it goes before.

    options validvarname=v7 missing='';
    
    proc import
    out=want (drop=Number)
    datafile="have"
    dbms=dlm
    replace;
    delimiter=';';
    guessingrows=max;
    run;