Search code examples
aggregatestring-concatenationspss

SPSS group by rows and concatenate string into one variable


I'm trying to export SPSS metadata to a custom format using SPSS syntax. The dataset with value labels contains one or more labels for the variables.

However, now I want to concatenate the value labels into one string per variable. For example for the variable SEX combine or group the rows F/Female and M/Male into one variable F=Female;M=Male;. I already concatenated the code and labels into a new variable using Compute CodeValueLabel = concat(Code,'=',ValueLabel). so the starting point for the source dataset is like this:

+--------------+------+----------------+------------------+
| VarName      | Code | ValueLabel     | CodeValueLabel   |
+--------------+------+----------------+------------------+
| SEX          | F    | Female         | F=Female         |
| SEX          | M    | Male           | M=Male           |
| ICFORM       | 1    | Yes            | 1=Yes            |
| LIMIT_DETECT | 0    | Too low        | 0=Too low        |
| LIMIT_DETECT | 1    | Normal         | 1=Normal         |
| LIMIT_DETECT | 2    | Too high       | 2=Too high       |
| LIMIT_DETECT | 9    | Not applicable | 9=Not applicable |
+--------------+------+----------------+------------------+

The goal is to get a dataset something like this:

+--------------+-------------------------------------------------+
| VarName      | group_and_concatenate                           |
+--------------+-------------------------------------------------+
| SEX          | F=Female;M=Male;                                |
| ICFORM       | 1=Yes;                                          |
| LIMIT_DETECT | 0=Too low;1=Normal;2=Too high;9=Not applicable; |
+--------------+-------------------------------------------------+

I tried using CASESTOVARS but that creates separate variables, so several variables not just one single string variable. I'm starting to suspect that I'm running up against the limits of what SPSS can do. Although maybe it's possible using some AGGREGATE or OMS trickery, any ideas on how to do this?


Solution

  • First I recreate your example here to demonstrate on:

    data list list/varName CodeValueLabel (2a30).
    begin data
    "SEX"  "F=Female"
    "SEX"  "M=Male"
    "ICFORM"  "1=Yes"
    "LIMIT_DETECT"  "0=Too low"
    "LIMIT_DETECT"  "1=Normal"
    "LIMIT_DETECT"  "2=Too high"
    "LIMIT_DETECT"  "9=Not applicable"
    end data.
    

    Now to work:

    * sorting to make sure all labels are bunched together.
    sort cases by varName CodeValueLabel.
    string combineall (a300).
    * adding ";" .
    compute combineall=concat(rtrim(CodeValueLabel), ";").
    * if this is the same varname as last row, attach the two together.
    if $casenum>1 and varName=lag(varName)  
         combineall=concat(rtrim(lag(combineall)), " ", rtrim(combineall)).
    exe.
    *now to select only relevant lines - first I identify them.
    match files /file=* /last=selectthis /by varName.
    *now we can delete the rest.
    select if selectthis=1.
    exe.
    

    NOTE: make combineall wide enough to contain all the values of your most populated variable.