Search code examples
characterspecial-charactersspss

Unicode character transformation in SPSS


I have a string variable. I need to convert all non-digit characters to spaces (" "). I have a problem with unicode characters. Unicode characters (the characters outside the basic charset) are converted to some invalid characters. See the code for example.

Is there any other way how to achieve the same result with procedure which would not choke on special unicode characters?

new file.

set unicode = yes.
show unicode.

data list free
 /T (a10).
begin data
1234
5678
absd
12as
12(a
12(vi
12(vī
12āčž
end data.

string Z (a10).
comp Z = T.

loop #k = 1 to char.len(Z).
if ~range(char.sub(Z, #k, 1), "0", "9") sub(Z, #k, 1) = " ".
end loop.

comp Z = normalize(Z).

comp len = char.len(Z).

list var = all.

exe.

The result:

T          Z               len

1234       1234              4
5678       5678              4
absd                         0
12as       12                2
12(a       12                2
12(vi      12                2
12(vī     12   �          6

>Warning # 649
>The first argument to the CHAR.SUBSTR function contains invalid characters.
>Command line: 1939  Current case: 8  Current splitfile group: 1

12āčž   12   �ž        7


Number of cases read:  8    Number of cases listed:  8

Solution

  • How about instead of replacing non-numeric characters, you cycle though and pull out the numeric characters and rebuild Z? (Note my version here is pre CHAR. string functions.)

    data list free
     /T (a10).
    begin data
    1234
    5678
    absd
    12as
    12(a
    12(vi
    12(vī
    12āčž
    12as23
    end data.
    
    STRING Z (a10).
    STRING #temp (A1).
    COMPUTE #len = LENGTH(RTRIM(T)).
    LOOP #i = 1 to #len.
      COMPUTE #temp = SUBSTR(T,#i,1).
      DO IF INDEX('0123456789',#temp) > 0.
        COMPUTE Z = CONCAT(SUBSTR(Z,1,#i-1),#temp).
      ELSE.
        COMPUTE Z = CONCAT(SUBSTR(Z,1,#i-1)," ").
      END IF. 
    END LOOP.
    EXECUTE.