Our database predates our database software having good unicode support, and in its place has a psuedo-base64 encoding which it uses to store UTF16 characters in an ascii field. I am writing a function to convert this type of field into straight UTF8 within SAS.
The function loops through the string converting each set of three ascii characters into a unicode character and placing it in an array. When experimenting with code in a data step I had used cat(of final{*})
to convert the array into a string, but the same code does not appear to be valid within a function.
I am currently collating the string in the loop with collate = trim(collate)!!trim(final{i})
and an arbitrary length collate string, but I would like to produce this directly from the array or at least set the size of the collate string based on the length of the input string.
I've included a pastebin of the data and function here.
Edit: The version of SAS I was using is 9.3
The same code is valid in a function in SAS 9.4 TS1M3; it may not be in earlier versions (significant changes were made to how arrays were handled in FCMP in 9.4 and in maintenance releases TS1M2 and 3).
However, this doesn't really solve your arbitrary length problem; when I run your function with
outtext = cat(of final{*});
return (outtext);
I get... 1 character! And when I run
return(cats(of final{*}));
output:
Obs text_enc finaltext
1 ABCABlABjABhAB1ABzABlAAgABVABUABGAA4AAgABpABzAAgABoABhAByABk BecauseU
2 ABTABpABtABwABsABlAByAAgABsABpABrABlAAgAB0ABoABpABz Simplerl
3 ABJABvAAgABJABvAAgABCAByABvABtABpABvABz IoIoBrom
which is a bit better (cats trims for you), I still only get 8 characters. That's because 8 characters is the default length in SAS for an undeclared character variable. Expand the length (using a length
statement for outtext) and you get:
Obs text_enc finaltext
1 ABCABlABjABhAB1ABzABlAAgABVABUABGAA4AAgABpABzAAgABoABhAByABk BecauseUTF8ishard
2 ABTABpABtABwABsABlAByAAgABsABpABrABlAAgAB0ABoABpABz Simplerlikethis
3 ABJABvAAgABJABvAAgABCAByABvABtABpABvABz IoIoBromios
You'll still need to define whatever length you need, then. FCMP doesn't, as far as I know, allow for a way to have an undefined-length string; you need to define the default (and maximum) length for the string you're going to return. The user is welcome to define a shorter length, and should, when it's appropriate.