I am trying to create a SAS Data Set from a text file. The text file shows data in a format exactly like this:
-HEADER HEADER HEADER
-HEADER HEADER HEADER
April SpringRace Male
$$$$$$$$$$$$$$$$$$$$
Name Age State /these are titles in the text file/
$$$$$$$$$$$$$$$$$$$$
John Smith 30 CA
Mark Doe 49 TX
May SpringRace2 Female
$$$$$$$$$$$$$$
Name Age State
$$$$$$$$$$$$$$
Betty White 50 ME
Jane Smith 37 NY
The issue I am having going through the data step is: by-passing varying header rows and then collecting the "event" data before the ****** titles ******* as variables then skipping over the titles and assigning variables for the actual people. It is a similar format throughout the huge text file. Please can anyone point me in the right direction?
I have been experimenting: Data work.test; infile c:\tester dlm=' , $' missover; input / / / Month $15. EventName $15. Gender $6. (This is where I get stuck as I do not know how to skip the "Name Age State" in the text file and just assign variables to "John Smith 30 CA" etc.) run;
I also think there must be a better way to get passed the headers as there is no certainty that they will always only be 2 rows long.
Thanks
I think that using @'my_char_string' column pointer in an INPUT statement would help you, if the titles that separate data values always repeat and you know what they are. For example:
INFILE mydatafile FLOWOVER FIRSTOBS=2;
INPUT month $ race $ sex $ @'State' first_name $ last_name $ address $;
The FIRSTOBS=2 option in INFILE statement skips the HEADER HEADER... row, and the FLOWOVER option tells SAS to keep looking for data on next line, in particular for @'State'. You may need to specify additional options and formatting, depending on your input file format, delimiters etc.
Per your edits, you could use the month value to determine that you are reading the start of an event, and then, using trailing @, retain and some conditional logic, read in your participants on separate lines and retain the event info across the participants, like this (just add all the remaining month names in the first if clause):
data test1;
length test $20 month $20 event $20 gender $20 firstname $20 lastname $20 state $2;
infile "test1.txt" DLM=' $' FIRSTOBS=5;
retain month event gender; * Keep these values from last readin;
input test $ @; /* Read in the first word in the data line being
read into test var, and stay on this line for
now (with @)*/
if strip(test) in('April', 'May') then do; /* If test var contains month,
then read in all of the variables,
and skip the name/age/state titles row*/
input @1 month $ event $ gender $ @'State' firstname $ lastname $ age state $ ;
end;
else do; /* Otherwise, the data line being read in should contain
only names, age and state, so read in those values only.
The month, event and gender values will be kept the same
by the retain statement above.*/
input @1 firstname $ lastname $ age state $ ;
end;
drop test; /* Comment out this drop statement to see whats in test var*/
run;
This code will work with varying numbers of participants per event. But the month cannot be missing in order for this code to work.
Helpful tip: To see what is in the current data line being read in by SAS, try adding
put _INFILE_;
after the INFILE statement. It will print the data lines to your log the way SAS sees them.