Search code examples
stringsplitsyntaxnumericspss

split String Variable in few numeric Variables in SPSS


I have a string variable with comma separated numbers that I want to split into four numeric variables.

makeArr var1a var1b var1c var1d
6,8,13,10 6 8 13 10
10,11,2 10 11 2
7,1,14,3 7 1 14 3

With:

 IF (CHAR.INDEX(makeArr,',') >= 1) 
   f12a=CHAR.SUBSTR(makeArr,1,CHAR.INDEX(makeArr,',')-1).
 EXECUTE.
 
 IF (CHAR.INDEX(makeArr,',') >= 1) 
   f12b=CHAR.SUBSTR(makeArr,CHAR.INDEX(makeArr,',')+1,CHAR.INDEX(makeArr,',')-1).
 EXECUTE.

I always get the first variable written without any problems. This no longer works with the second variable because it has a different length and the comma is also written here.

So I would need a split at the comma and the division of the numbers over the comma.


Solution

  • Since char.substr will only tell you about the location of the first occurence of the search string, you need to start the second search from a new location - AFTER the first occurence, and this gets more and more complicated as you continue. My suggestion is create a copy of your array variable, which you will cut pieces off as you proceed - so that you are only searching for the first occurence of "," every time.

    First I recreate your example data to demonstrate on.

    data list free/makeArr (a20).  
    begin data  
    "6,8,13,10" "10,11,2" "7,1,14,3"  
    end data.  
    

    Now I copy your array into a new variable #tmp. Note that I add a "," at the end so the syntax stays the same for all parts of the array. I add the "#" at the beginning of the name to make it invisible, you can remove it if you want.
    It is possible to do the following calculation in steps as you started to do, but nicer to loop throug the steps (especially if this is an example for a longer array).

    string f12a f12b f12c f12d #tmp (a20).
    compute #tmp=concat(rtrim(makeArr),","). 
    do repeat nwvr=f12a f12b f12c f12d.
        do  IF #tmp<>"".
            compute nwvr=CHAR.SUBSTR(#tmp,1,CHAR.INDEX(#tmp,',')-1).
            compute #tmp=CHAR.SUBSTR(#tmp,CHAR.INDEX(#tmp,',')+1).
        end if.
    end repeat.
    EXECUTE.