Search code examples
spss

Why are the totals different before and after aggregating in SPSS?


I have a large database with ID as the first column. The second variable, EventName, is a time series. So IDs will overlap within each time series. The following variables identify the primary and duplicates for each unique ID. The following variable are MH Diagnoses (0=no; 1=yes) for each time period. The TypeMH goes up to 25. The last variable in the example below is a sum of the 1s for all MHTypes 1-25.

When I run a frequency count on the whole database for each MH Type, I get one number for counts of TypeMH1. But when I aggregate the IDs to sum all of the same IDs, the frequency count is lower. See picture 2.

What am I doing wrong? Thanks.

I've tried running the aggregate by sum and by counts. Same results.

enter image description here

enter image description here


Solution

  • From what I see in the outputs, the first frequencies ran on the summing variable New_MH_After_Entry_Counts, which doesn't differentiate between the typemho variables - just sums them. The label you marked in yellow is wrong. What that line in the table actually means is that in 137 lines in the dataset the sum of typemho___1 to typemho___25 is 1.
    This number has nothing to do with 223, which is the number of lines marked 1 in typemho___1 (as presented in the second frequency analysis).