I have a dataframe with region, client and some deliveries. There is this column used as type of purchase and the first and last purchase are marked as 'first' and 'last' but any in-between delivery is marked as "delivery" Is there a way to transform the delivery and get a tag of "delivery1", "delivery2", etc?
import pandas as pd
data = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY', 20], ['NY', 'A','DELIVERY', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY', 10], ['NY', 'A','DELIVERY', 12], ['NY', 'A','DELIVERY', 25], ['NY', 'A','LAST', 20]
]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Region', 'Client', 'purchaseType', 'price'])
# print dataframe.
df
Desired Output:
data2 = [['NY', 'A','FIRST', 10], ['NY', 'A','DELIVERY1', 20], ['NY', 'A','DELIVERY2', 30], ['NY', 'A','LAST', 25],
['NY', 'B','FIRST', 15], ['NY', 'B','DELIVERY1', 10], ['NY', 'B','LAST', 20],
['FL', 'A','FIRST', 15], ['FL', 'A','DELIVERY1', 10], ['NY', 'A','DELIVERY2', 12], ['NY', 'A','DELIVERY3', 25], ['NY', 'A','LAST', 20]
]
df2 = pd.DataFrame(data2, columns = ['Region', 'Client', 'purchaseType', 'price'])
print(df2)
Thanks in advance!
You can use np.where
to decide where to add numerical suffix:
df['purchaseType'] = df.groupby((df['purchaseType']=='FIRST').cumsum())['purchaseType'].transform(
lambda x: np.where(x=='DELIVERY', x+np.arange(len(x)).astype(str), x)
)
print(df)
Prints:
Region Client purchaseType price
0 NY A FIRST 10
1 NY A DELIVERY1 20
2 NY A DELIVERY2 30
3 NY A LAST 25
4 NY B FIRST 15
5 NY B DELIVERY1 10
6 NY B LAST 20
7 FL A FIRST 15
8 FL A DELIVERY1 10
9 NY A DELIVERY2 12
10 NY A DELIVERY3 25
11 NY A LAST 20