I am manipulating some string variables and certain cells are returning unexpected values with substring and length functions. These cells hold character-formatted dates, as I need to do a few manipulations before converting them to SAS dates, because of the nature of the Excel file they're coming from. Here is an example:
HAVE:
Obs _orig
1 4/3
2 12/16
3 1/13
4 6/2
5 3/10
6 5/4
7 10/14
WANT:
Obs _orig _length _sub_1_2
1 4/3 3 4/
2 12/16 5 12
3 1/13 4 1/
4 6/2 3 6/
5 3/10 4 3/
6 5/4 3 5/
7 10/14 5 10
I am using this code:
data want;
set have;
_strip=strip(_orig);
_sub_1_2=substr(_strip,1,2);
_length=length(_strip);
run;
This is what I get. The discrepancies are bolded.
Obs _orig _length _sub_1_2
1 4/3 5
2 12/16 5 12
3 1/13 4 1/
4 6/2 3 6/
5 3/10 4 3/
6 5/4 5
7 10/14 5 10
Both are cases where SAS calculates length=5 when length should = 3. In both cases, the value for the substring-derived variable is blank altogether. Results are the same if I use compress(), trim(), or trimn() in my code, rather than strip(). Thank you for any help you can provide
Sounds like maybe unprintable characters got in your data. If you PUT _orig $hex.;
to the log, what do you see? Should be: 342F332020
152 data want;
153 length orig $5;
154 orig='4/3';
155 len=length(orig);
156 put orig= len=;
157 put orig hex.;
158 run;
orig=4/3 len=3
342F332020
To get rid of non-printable characters, you could try:
_strip=compress(orig,,'kw');