I have a dataset which contains thousands of rows which each person assigned a ClientID. I would like to use the ClientID variable to generate a new ID variable which starts at 1. Some ClientIDs are duplicated so I would like to make sure that duplicate ClientIDs are given the same ID number. Client IDs are string and my data has to be sorted by TimeStamp.
My data looks like:
ClientID TimeStamp
15137.45692 15/03/2021
10489.15789 03/02/2021
14143.96745 01/01/2021
15137.45692 15/01/2021
15137.45692 27/02/2021
14143.96745 08/03/2021
I would like it to look like:
ID ClientID TimeStamp
1 14143.96745 01/01/2021
2 15137.45692 15/01/2021
3 10489.15789 03/02/2021
2 15137.45692 27/02/2021
1 14143.96745 08/03/2021
2 15137.45692 15/03/2021
How do I do this?
I would do it in excel but I have over 250k rows of data and excel keeps crashing.
Thanks
The following syntax creates ID=1 and then adds 1 only in case of a new ClientID:
sort cases by ClientID.
compute ID=1.
if $casenum>1 ID=lag(ID)+(ClientID<>lag(ClientID)).
exe.
EDIT:
Here's another nice way to do it using rank
function:
RANK VARIABLES=ClientID (A) /RANK /PRINT=NO /TIES=CONDENSE.