Search code examples
stringdatesassubstringstring-length

SAS 9.4 Character functions - Why might one cell return unexpected results?


I am manipulating some string variables and certain cells are returning unexpected values with substring and length functions. These cells hold character-formatted dates, as I need to do a few manipulations before converting them to SAS dates, because of the nature of the Excel file they're coming from. Here is an example:

HAVE:
Obs _orig
1 4/3
2 12/16
3 1/13
4 6/2
5 3/10
6 5/4
7 10/14

WANT:
Obs _orig _length _sub_1_2
1 4/3 3 4/
2 12/16 5 12
3 1/13 4 1/
4 6/2 3 6/
5 3/10 4 3/
6 5/4 3 5/
7 10/14 5 10

I am using this code:

data want;
set have;
_strip=strip(_orig);
_sub_1_2=substr(_strip,1,2);
_length=length(_strip);
run;

This is what I get. The discrepancies are bolded.
Obs _orig _length _sub_1_2
1 4/3 5
2 12/16 5 12
3 1/13 4 1/
4 6/2 3 6/
5 3/10 4 3/
6 5/4 5
7 10/14 5 10

Both are cases where SAS calculates length=5 when length should = 3. In both cases, the value for the substring-derived variable is blank altogether. Results are the same if I use compress(), trim(), or trimn() in my code, rather than strip(). Thank you for any help you can provide


Solution

  • Sounds like maybe unprintable characters got in your data. If you PUT _orig $hex.; to the log, what do you see? Should be: 342F332020

    152  data want;
    153   length orig $5;
    154   orig='4/3';
    155   len=length(orig);
    156   put orig= len=;
    157   put orig hex.;
    158  run;
    
    orig=4/3 len=3
    342F332020
    

    To get rid of non-printable characters, you could try:

    _strip=compress(orig,,'kw');