I have a large database with ID as the first column. The second variable, EventName, is a time series. So IDs will overlap within each time series. The following variables identify the primary and duplicates for each unique ID. The following variable are MH Diagnoses (0=no; 1=yes) for each time period. The TypeMH goes up to 25. The last variable in the example below is a sum of the 1s for all MHTypes 1-25.
When I run a frequency count on the whole database for each MH Type, I get one number for counts of TypeMH1. But when I aggregate the IDs to sum all of the same IDs, the frequency count is lower. See picture 2.
What am I doing wrong? Thanks.
I've tried running the aggregate by sum and by counts. Same results.
From what I see in the outputs, the first frequencies
ran on the summing variable New_MH_After_Entry_Counts
, which doesn't differentiate between the typemho
variables - just sums them. The label you marked in yellow is wrong. What that line in the table actually means is that in 137 lines in the dataset the sum of typemho___1
to typemho___25
is 1.
This number has nothing to do with 223, which is the number of lines marked 1 in typemho___1
(as presented in the second frequency analysis).