Search code examples
duplicatesspssuniqueidentifier

SPSS: How do I generate ID numbers from client ID variable that contains duplicate IDs


I have a dataset which contains thousands of rows which each person assigned a ClientID. I would like to use the ClientID variable to generate a new ID variable which starts at 1. Some ClientIDs are duplicated so I would like to make sure that duplicate ClientIDs are given the same ID number. Client IDs are string and my data has to be sorted by TimeStamp.

My data looks like:

ClientID TimeStamp

15137.45692 15/03/2021

10489.15789 03/02/2021

14143.96745 01/01/2021

15137.45692 15/01/2021

15137.45692 27/02/2021

14143.96745 08/03/2021

I would like it to look like:

ID ClientID TimeStamp

1 14143.96745 01/01/2021
    
2 15137.45692 15/01/2021
    
3 10489.15789 03/02/2021
    
2 15137.45692 27/02/2021
    
1 14143.96745 08/03/2021
    
2 15137.45692 15/03/2021

How do I do this?

I would do it in excel but I have over 250k rows of data and excel keeps crashing.

Thanks


Solution

  • The following syntax creates ID=1 and then adds 1 only in case of a new ClientID:

    sort cases by ClientID.
    compute ID=1.
    if $casenum>1 ID=lag(ID)+(ClientID<>lag(ClientID)).
    exe.
    

    EDIT:
    Here's another nice way to do it using rank function:

    RANK VARIABLES=ClientID (A) /RANK /PRINT=NO /TIES=CONDENSE.