I have a trip data where I have start date, end date, start location and end location. I need to generate a key for this data set.
First, sort the data based on start date and then for every start and end date pair if the end location of previous row and start location of current row are same, assign same ID else assign new id. If date are different by default assign new ID.
This is the output I require. [For input just delete the Key column]
Start End Start Location End Location Key
06-11-2020 17-11-2020 A B 1
17-11-2020 17-11-2020 B C 2
17-11-2020 17-11-2020 C D 2
19-11-2020 19-11-2020 E F 3
19-11-2020 19-11-2020 G A 4
19-11-2020 19-11-2020 A H 4
19-11-2020 19-11-2020 H G 4
19-11-2020 19-11-2020 G A 4
20-11-2020 20-11-2020 E H 5
20-11-2020 20-11-2020 H I 5
20-11-2020 20-11-2020 J A 6
21-11-2020 21-11-2020 E G 7
21-11-2020 21-11-2020 K A 8
21-11-2020 21-11-2020 E A 9
21-11-2020 21-11-2020 E G 10
21-11-2020 21-11-2020 G A 10
You need another key to group with Start
and End Start
columns according your requirement (check if the current Location
is the same as previous End Location
). After that, use ngroup
to get the group number:
df['Key'] = (df.sort_values('Start')
.assign(Loc=df['Location'].ne(df['End Location'].shift()).cumsum())
.groupby(['Start', 'End Start', 'Loc'])
.ngroup().add(1).drop(columns='Loc'))
Output:
>>> df
Start End Start Location End Location Key
0 06-11-2020 17-11-2020 A B 1
1 17-11-2020 17-11-2020 B C 2
2 17-11-2020 17-11-2020 C D 2
3 19-11-2020 19-11-2020 E F 3
4 19-11-2020 19-11-2020 G A 4
5 19-11-2020 19-11-2020 A H 4
6 19-11-2020 19-11-2020 H G 4
7 19-11-2020 19-11-2020 G A 4
8 20-11-2020 20-11-2020 E H 5
9 20-11-2020 20-11-2020 H I 5
10 20-11-2020 20-11-2020 J A 6
11 21-11-2020 21-11-2020 E G 7
12 21-11-2020 21-11-2020 K A 8
13 21-11-2020 21-11-2020 E A 9
14 21-11-2020 21-11-2020 E G 10
15 21-11-2020 21-11-2020 G A 10
Details about Loc
:
# Loc = df['Location'].ne(df['End Location'].shift()).cumsum().rename('Loc')
>>> pd.concat([df[['Location', 'End Location']], Loc], axis=1)
Location End Location Loc
0 A B 1
1 B C 1
2 C D 1
3 E F 2
4 G A 3
5 A H 3
6 H G 3
7 G A 3
8 E H 4
9 H I 4
10 J A 5
11 E G 6
12 K A 7
13 E A 8
14 E G 9
15 G A 9