Search code examples
spss

How to equalize the number of rows per unit in an SPSS file


I have a file with a different number of rows for every "unit", and I'd like all the units to have the same number of rows, by adding the right number of empty rows per unit in the data.

For example:

data list list/ unit serial someData.
begin data.
1 1 54
2 1 57
2 2 87
2 3 91
3 1 17
3 2 43
end data.

what i'd like to get to is this:

1 1 54  
1 2  .  
1 3  .  
2 1 57  
2 2 87  
2 3 91  
3 1 17  
3 2 43  
3 3  .  

I've worked with simple workarounds, for example casestovars => varstocases (keeping nulls), or preparing a base file with all the lines with unit names and serials, and then matching it with the data file so I end up with all the lines and all the data.
Could anyone suggest a more direct (\elegant\efficient\simple) approach? Thanks!


Solution

  • Cartesian product is what you require here.

    Using your example data and downloading the Custom Extension Command, you can solve as below:

    data list list/ unit serial someData.
    begin data.
    1 1 54
    2 1 57
    2 2 87
    2 3 91
    3 1 17
    3 2 43
    end data.
    DATASET NAME ds0.
    DATASET ACTIVATE ds0.
    STATS CARTPROD VAR1=unit VAR2=serial /SAVE OUTFILE="C:\Temp\dsCart".
    SORT CASES BY unit serial.
    MATCH FILES FILE=* /BY unit serial /FIRST=Primary.
    SELECT IF Primary.
    MATCH FILES FILE=* /FILE=ds0  /BY unit serial /DROP=Primary.
    EXE.
    

    I'm not sure how efficient this Custom Extension Command is so you may want to experiment with different flavours of using STATS CARTPROD. An alternative approach would be to create two datasets (left and right) with your unique unit and serial values and then process these through the STATS CARTPROD command.