Search code examples
pythonpandasdataframeobjectcategorical-data

Is there a way to number repeated items from a group on a dataframe in pandas?


I have a dataframe with region, client and some deliveries. There is this column used as type of purchase and the first and last purchase are marked as 'first' and 'last' but any in-between delivery is marked as "delivery" Is there a way to transform the delivery and get a tag of "delivery1", "delivery2", etc?

import pandas as pd  
data = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
       ['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
       ['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['NY', 'A','DELIVERY', 12], ['NY', 'A','DELIVERY', 25], ['NY', 'A','LAST', 20]
       ] 
  
# Create the pandas DataFrame 
df = pd.DataFrame(data, columns = ['Region', 'Client', 'purchaseType', 'price']) 
  
# print dataframe. 
df 

Desired Output:

data2 = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY1', 20], ['NY', 'A','DELIVERY2', 30], ['NY', 'A','LAST', 25],
       ['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY1', 10], ['NY', 'B','LAST', 20],
       ['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY1', 10], ['NY', 'A','DELIVERY2', 12], ['NY', 'A','DELIVERY3', 25], ['NY', 'A','LAST', 20]
       ] 

df2 = pd.DataFrame(data2, columns = ['Region', 'Client', 'purchaseType', 'price']) 
print(df2)

Thanks in advance!


Solution

  • You can use np.where to decide where to add numerical suffix:

    df['purchaseType'] = df.groupby((df['purchaseType']=='FIRST').cumsum())['purchaseType'].transform(
        lambda x: np.where(x=='DELIVERY', x+np.arange(len(x)).astype(str), x)
    )
    
    print(df)
    

    Prints:

       Region Client purchaseType  price
    0      NY      A        FIRST     10
    1      NY      A    DELIVERY1     20
    2      NY      A    DELIVERY2     30
    3      NY      A         LAST     25
    4      NY      B        FIRST     15
    5      NY      B    DELIVERY1     10
    6      NY      B         LAST     20
    7      FL      A        FIRST     15
    8      FL      A    DELIVERY1     10
    9      NY      A    DELIVERY2     12
    10     NY      A    DELIVERY3     25
    11     NY      A         LAST     20