I have a large set of survey data in SPSS where around 15% of respondents answered the survey more than once (this was not intended). I have formulated a systematic method to determine which cases to keep but am not sure how to write the loop to perform this task.
The variables I have are:
Here are what my data look like:
ID SurveyComplete Duplicate PrimaryFirst MatchSequence Date keep
123 1 1 1 1 07162015 .
123 1 1 0 2 07182015 .
456 0 1 1 1 07152015 .
456 1 1 0 2 07192015 .
789 0 1 1 1 07112015 .
789 0 1 0 2 07182015 .
789 0 1 0 3 07212015 .
012 1 0 1 1 07122015 .
Theoretically, I would like to determine the following in the order below:
And here is the resulting keep column:
ID SurveyComplete Duplicate PrimaryFirst MatchSequence Date keep
123 1 1 1 1 07162015 1
123 1 1 0 2 07182015 0
456 0 1 1 1 07152015 0
456 1 1 0 2 07192015 1
789 0 1 1 1 07112015 0
789 0 1 0 2 07182015 0
789 0 1 0 3 07212015 1
012 1 0 1 1 07122015 1
Ideally I would like to be able to complete this in SPSS syntax without plugins as my work doesn't take kindly to software add-ons. Any help that can be provided is much appreciated!
After every step an AGGREGATE function determines for every ID if a decision was already made. An ID that already has a decision will be taken out of the game, undecided IDs go on to the next step:
* creating fake data to play around with.
* note I added an extra line for ID=456 to demonstrate choice between multiple non-primary lines.
DATA LIST list (", ") / ID SurveyComplete Duplicate PrimaryFirst MatchSequence Date.
begin data
123, 1, 1, 1, 1, 7162015
123, 1, 1, 0, 2, 7182015
456, 0, 1, 1, 1, 7152015
456, 1, 1, 0, 2, 7192015
456, 1, 1, 0, 3, 7192015
789, 0, 1, 1, 1, 7112015
789, 0, 1, 0, 2, 7182015
789, 0, 1, 0, 3, 7212015
12, 1, 0, 1, 1, 7122015
end data.
execute.
* now starting work on defining the KEEP variable.
if (PrimaryFirst = 1 AND SurveyComplete = 1) keep=1.
if (PrimaryFirst = 0 AND SurveyComplete = 1) NotPrimarySeq=MatchSequence.
aggregate /outfile=* mode=addvariables /break=ID /decided=max(keep)/NotPrimarySeq_min=min(NotPrimarySeq).
if missing(decided) and (PrimaryFirst = 0 AND SurveyComplete = 1) keep=(NotPrimarySeq=NotPrimarySeq_min).
aggregate/outfile=* mode=addvariables overwritevars=yes /break=ID/decided=max(keep)/Date_max=MAX(Date).
if missing(decided) keep=(date=date_max).
recode keep (miss=0).
execute.