Search code examples
spss

SPSS LAG Function


I have a SPSS dataset like this, where I would like to identify if a subsequent date is a "duplicate" of a previous date for a given ID:

ID   CorrDate    

39   07/24/2017  
39   07/25/2017  
39   07/27/2017  
39   07/27/2017  
91   03/01/2017  
99   07/04/2017  
999  02/22/2017  
999  02/22/2017  
999  02/22/2017  
999  02/22/2017        

I tried the following LAG function in SPSS:

SORT CASES BY ID(A) CorrDate(A).

IF (ID=LAG(ID) AND CorrDate ne LAG(CorrDate)) Duplicate = 0. 
EXECUTE.

IF (ID=LAG(ID) AND CorrDate eq LAG(CorrDate)) Duplicate = 1. 
EXECUTE.

However, this did not appear to yield accurate results, so I tried the following commands to see if I could determine the source of the problem:

COMPUTE PreviousID=LAG(ID).
COMPUTE PreviousDate=LAG(CorrDate).
EXECUTE.

IF (ID=PreviousID) AND (CorrDate~=PreviousDate) Duplicate = 0. 
EXECUTE.

IF (ID=PreviousID) AND (CorrDate=PreviousDate) Duplicate = 1. 
EXECUTE.

Both yielded the following output, which does not seem to correctly identify duplicates for ID #39 and 999:

ID  PreviousID   CorrDate    PreviousDate  Duplicate

39  39           07/24/2017  07/23/2017    0
39  39           07/25/2017  07/24/2017    0
39  39           07/27/2017  07/25/2017    0
39  39           07/27/2017  07/27/2017    0
91  39           03/01/2017  07/27/2017    .
99  91           07/04/2017  03/01/2017    .
999 99           02/22/2017  07/04/2017    .
999 999          02/22/2017  02/22/2017    0
999 999          02/22/2017  02/22/2017    0
999 999          02/22/2017  02/22/2017    1

Am I sorting incorrectly? Or do I need to specify another lag option? Thanks for any assistance!


Solution

  • Both your methods for finding the duplicates are good and should work, but here are two more efficient ways:

    aggregate out=* mode=add /break=ID CorrDate/occurrences=n.
    

    This will create a new variable with the number of times that each combination of ID and CorrDate occurs in the data.

    If you want more options (e.g automatically selecting one of the duplicates for keepin) use the menus Data > Identify Duplicate Cases, choose the options that you need.

    Re the cases that don't seem to work: If SPSS says those two dates are not equal, they aren't... Like @horace_vr says, the dates probably contain time also. You can easily see that in the data by changing the date format to include time, or just change type to numeric, then the difference will be visible.