Search code examples
syntaxadditionspss

Combining add cases and add variables by merging files in SPSS


I would like to merge different SPSS files. The PAID indicates different persons. The files also contain the variable ID which indicates the moment of measurement. So ID=1 means that the data are results of measurement one (ID=2 ; measurement two etc.). However, not all data files contain the same moments of measurement.

I have already read the following post, but that has not completely answered my question: SPSS - merging files with duplicate cases of ID variable and new cases/variables

Example data files

Data file 1:

PAID  ID  X1  X2  X3  X4
1     1   3   4   4   5
2     1   3   4   5   6
3     1   3   4   4   6
4     1   .   .   .   .

Data file 2:

PAID  ID  X5  X6  X7  
1     1   1   1   2
1     2   1   2   1
2     1   1   2   2
2     2   2   2   2
3     1   1   1   1
3     2   1   .   .
4     1   1   1   1
4     2   2   2   2

I want the following result:

PAID  ID  X1  X2  X3  X4  X5  X6  X7
1     1   3   4   4   5   1   1   2
1     2   .   .   .   .   1   2   1
2     1   3   4   5   6   1   2   2
2     2   .   .   .   .   2   2   2
3     1   3   4   4   6   1   1   1
3     2   .   .   .   .   1   .   .
4     1   .   .   .   .   1   1   1
4     2   .   .   .   .   2   2   2

I think I have to use some combination of the functions add cases and add variables. However, is this possible within SPSS? And if so, how can I do this?

Thanks in advance!


Solution

  • This will do the job:

    match files /file='path\DataFile1.sav' /file='path\DataFile2.sav'/by paid id.
    

    Please note though, both files need to be sorted by paid id before running the match.

    To demonstrate with your sample data:

    *first preparing demonstration data.
    DATA LIST list/paid id x1 to x4 (6f).
    begin data.
    1,1,3,4,4,5
    2,1,3,4,5,6
    3,1,3,4,4,6
    4,1, , , ,
    end data.
    * instead of creating the data, you can can get your original data:
    * get file="path\file name 1.sav".
    sort cases by paid id.
    dataset name DataFile1.
    
    
    DATA LIST list/paid id x5 to x7 (5f).
    begin data.
    1,1,1,1,2
    1,2,1,2,1
    2,1,1,2,2
    2,2,2,2,2
    3,1,1,1,1
    3,2,1, ,
    4,1,1,1,1
    4,2,2,2,2
    end data.
    sort cases by paid id.
    dataset name DataFile2.
    
    match files /file=DataFile1 /file=DataFile2/by paid id.
    exe.
    

    the result looks like this:

    paid id x1  x2  x3  x4  x5  x6  x7
    1    1  3   4   4   5   1   1   2
    1    2                  1   2   1
    2    1  3   4   5   6   1   2   2
    2    2                  2   2   2
    3    1  3   4   4   6   1   1   1
    3    2                  1       
    4    1                  1   1   1
    4    2                  2   2   2