Have the following dataframe:
Sometimes the index is duplicate and then I want to change the value in the column 'Hotspot'. So prograding_feature_polygon_30 should be changed to prograding_feature_polygon_30_1 and the second on index 0 to prograding_feature_polygon_30_2.
The same on index 1, so again the values should be changed to prograding_feature_polygon_30_1 and prograding_feature_polygon_30_2. And so on...
Indexes are not always duplicate and if not, the value in Hotspot should remain the same. Anyone knows an easy way to do this?
Regards,
Dante
Data sample
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': np.repeat([*'ABCD'],[2,1,3,1]),
'b': [*range(7)]},
index=np.repeat([*range(4)],[2,1,3,1]))
print(df)
a b
0 A 0
0 A 1
1 B 2
2 C 3
2 C 4
2 C 5
3 D 6
Problem
For each duplicate in the index, we want to add a consecutive number to the values in column a
. So, A_1, A_2
for index value 0
, and C_1, C_2, C_3
for index value 2
. Values without duplicates (1
and 3
) should be unaffected.
Solution
df.a = np.where(df.index.duplicated(keep=False),
df.a + '_' + df.groupby(level=0).cumcount().add(1).astype(str),
df.a)
print(df)
a b
0 A_1 0
0 A_2 1
1 B 2
2 C_1 3
2 C_2 4
2 C_3 5
3 D 6
Explanation
df.index.duplicated
with param keep=False
to get an array with True
for duplicates, False
for non-duplicates.np.where
. If True
, we want df.a + consecutive number
, else simply df.a
.df.groupby
on the index, and apply .cumcount
to enumerate items per group. add(1)
to start at 1
, instead of 0
. Finally, use astype(str)
, in view of the concatenation with df.a
.