python pandas indexing duplicates rename

Change column value on identical indexes

Have the following dataframe:

Sometimes the index is duplicate and then I want to change the value in the column 'Hotspot'. So prograding_feature_polygon_30 should be changed to prograding_feature_polygon_30_1 and the second on index 0 to prograding_feature_polygon_30_2.

The same on index 1, so again the values should be changed to prograding_feature_polygon_30_1 and prograding_feature_polygon_30_2. And so on...

Indexes are not always duplicate and if not, the value in Hotspot should remain the same. Anyone knows an easy way to do this?

Regards,

Dante

Solution

Data sample

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': np.repeat([*'ABCD'],[2,1,3,1]),
                   'b': [*range(7)]}, 
                  index=np.repeat([*range(4)],[2,1,3,1]))

print(df)

   a  b
0  A  0
0  A  1
1  B  2
2  C  3
2  C  4
2  C  5
3  D  6

Problem

For each duplicate in the index, we want to add a consecutive number to the values in column a. So, A_1, A_2 for index value 0, and C_1, C_2, C_3 for index value 2. Values without duplicates (1 and 3) should be unaffected.

Solution

df.a = np.where(df.index.duplicated(keep=False),
         df.a + '_' + df.groupby(level=0).cumcount().add(1).astype(str),
         df.a)

print(df)

     a  b
0  A_1  0
0  A_2  1
1    B  2
2  C_1  3
2  C_2  4
2  C_3  5
3    D  6

Explanation

Use df.index.duplicated with param keep=False to get an array with True for duplicates, False for non-duplicates.
Use this array inside np.where. If True, we want df.a + consecutive number, else simply df.a.
Use df.groupby on the index, and apply .cumcount to enumerate items per group. add(1) to start at 1, instead of 0. Finally, use astype(str), in view of the concatenation with df.a.