Search code examples
mergeduplicatesspss

SPSS - merging files with duplicate cases of ID variable and new cases/variables


I have an administrative dataset for store visits from multiple years that I am trying to merge into one under the ID variable.

Each dataset has duplicates of an ID that occur during different store visits, annotated by Date. Some of the more recent data files also have new variables (Y) not contained in the old data files. Datasets from different years will also contain different cases indicated by different ID. Also, some variables may be the same for each case but at different dates. I want the merged file to retain these duplicates.

Example data files:

File 1

ID Date X
1  3    4
1  5    3
2  1    4

File 2

ID Date X  Y
1  6    4  2
1  7    1  5
2  8    4  7
3  7    2  3

I want the merged file to continue listing ALL duplicate cases, as such:

ID Date X  Y
1  3    4  .
1  5    3  .
1  6    4  2
1  7    1  5
2  1    4  .
2  8    4  7
3  7    2  3

I then plan to restructure (CASESTOVARS /AUTOFIX=0) the merged file so that it looks like this:

ID Date.1 Date.2 Date.3 Date.4  X.1  X.2  X.3  X.4  Y.1  Y.2  Y.3  Y.4
1  3      5      6      7       4    3    4    1    .    .    2    5
2  1      8      .      .       4    4    .    .    .    7    .    .
3  7      .      .      .       2    .    .    .    3    .    .    .

I am having trouble with the initial merging process, however. I have tried looking up the safest way to merge files when they both have duplicate cases in order to make sure no data are lost in the process. It seems that the "Add Variables" method results in lost values for duplicate variables.

Thanks!

EDIT: If I used the "Add Variables" function and used both the ID and Date variables as the key variables, would that help avoid deletion of duplicate cases?


Solution

  • Why not try add cases instead of add variables? if there are no occurrences of the same Id with the same date it should work OK with the casestovars.

    If there are such cases, you'll need to think what you want to do with them before you can proceed with the casestovars.
    One way would be to aggregate by ID and DATE and decide if you want to e.g. add up the data vars for this case.