I am looking at two procedures to import two txt files in SAS. The first file is fixed width. The second txt file is delimited file. The SAS code attached below:
DATA filename;
INFILE "filelocation";
INPUT
VAR1 $1-11
VAR2 $13-16
@18 VAR3 MMDDYY10.
VAR4 $29-53;
INFORMAT VAR1 $11.;
INFORMAT VAR2 $4.;
INFORMAT VAR3 MMDDYY10.;
INFORMAT VAR4 $25.;
FORMAT VAR1 $11.;
FORMAT VAR2 $4.;
FORMAT VAR3 MMDDYY10.;
FORMAT VAR4 $25.;
;
RUN;
DATA filename;
INFILE "filelocation" DELIMITER="|" MISSOVER
DSD LRECL=32767;
INFORMAT VAR1 $11.;
INFORMAT VAR2 $4.;
INFORMAT VAR3 MMDDYY10.;
INFORMAT VAR4 $25.;
FORMAT VAR1 $11.;
FORMAT VAR2 $4.;
FORMAT VAR3 MMDDYY10.;
FORMAT VAR4 $25.;
INPUT
VAR1 $
VAR2 $
VAR3
VAR4 $
;
RUN;
My questions are:
1. Why does the "INPUT" locate in the beginning of the code in the first procedure, but in the last in the second procedure? Does the order of "INPUT" matters?
In the first procedure, there is a "@18" in front of VAR3, which VAR3 is a variable represents date, and it determines the VAR3 starts from the position 18th. Can all of the variables use this expression?
ex.
@1 VAR1 $
@13 VAR2 $
@18 VAR3 MMDDYY10.
@29 VAR4 $;
In the procedure2,
INPUT
VAR1 $
VAR2 $
VAR3
VAR4 $
why doesn't the variable have any number after the "$" sign to determine the length of the variable?
Thank you!
The main difference you are talking about is the difference between data that is stored in FIXED column locations and data that is DELIMITED. Since your first example uses data with fixed column locations you can use column ranges (1-11
) to read the data. With delimited data you cannot specify fixed columns (or even fixed lengths to read) since you do not know how many characters there are between the delimiters. Instead you must use list mode input and SAS will read the value up to the next delimiter.
Let's tackle the detailed questions.
The important thing to understand about the order of statements when building a dataset is the impact that the order might have on the result. SAS will try to determine the definition of variables you are using as soon as it can. So if you place a FORMAT
statement before your INPUT
statement it can impact both the type of variable that SAS creates and the order that they are created in the data step.
When you ask it to read VAR1 $ 1-11
you are asking it to read whatever is in columns 1 to 11, including any embedded blanks. It also knows that you want VAR1 to be defined as character (since you used the $
) and it should have room for 11 bytes. When you ask it to read @1 VAR1 $
it will read the next word that it sees starting at column 1. It will stop at the first blank. So it might read column 1 to 5 or it might read column 70 to 77, if column 1 to 69 are blank. It will also make VAR1 have a length of only 8 (unless you previously defined it) since that is the default for character variables when SAS cannot tell that you want a different length.
The reason that the original program used @18 VAR3 MMDDYY10.
is because you need to specify the informat to have SAS properly convert the text in the data into the number that SAS uses to represent that date and you cannot do that with a column range.
$
since you have already defined the variable type.You have previously set the length for the variable the first time that they were referenced. So the INFORMAT
statement(s) have had the side effect of setting the length of the variable in addition to the INFORMAT that should be used to convert the text being read. If you really want to define your variables you should use a LENGTH
or ATTRIB
statement.