I have 3 columns in the dataframe. object, id and price. I want fill the blanks by reading the id column and discover which price should I use. For exemple: If the id ends in (A,B or C) the price should be 30 but if it's end (7A,7B or 7C) the price should be 50, If the id ends in (E,F or G) the price should be 20, If the id ends in (O,M or N), the price should be 10.
Here is the dataframe:
object id price
0 laptop 24A 30
1 laptop 37C NaN
2 laptop 21O NaN
3 laptop 17C 50
4 laptop 55A 30
5 laptop 34N NaN
6 laptop 05E 20
7 laptop 29B NaN
8 laptop 22M 10
9 laptop 62F NaN
10 laptop 23G 20
11 laptop 61O NaN
12 laptop 27A NaN
Expected output:
object id price
0 laptop 24A 30
1 laptop 37C 50
2 laptop 21O 10
3 laptop 17C 50
4 laptop 55A 30
5 laptop 34N 10
6 laptop 05E 20
7 laptop 29B 30
8 laptop 22M 10
9 laptop 62F 20
10 laptop 23G 20
11 laptop 61O 10
12 laptop 27A 50
You can use np.select
with str.contains
conditions:
conditions = {
30: df.id.str.contains('[^7][ABC]$'),
50: df.id.str.contains('7[ABC]$'),
20: df.id.str.contains('[EFG]$'),
10: df.id.str.contains('[OMN]$'),
}
df.price = np.select(conditions.values(), conditions.keys())
# object id price
# 0 laptop 24A 30
# 1 laptop 37C 50
# 2 laptop 21O 10
# 3 laptop 17C 50
# 4 laptop 55A 30
# 5 laptop 34N 10
# 6 laptop 05E 20
# 7 laptop 29B 30
# 8 laptop 22M 10
# 9 laptop 62F 20
# 10 laptop 23G 20
# 11 laptop 61O 10
# 12 laptop 27A 50
You could also use loc
masking if you want to use fillna
:
for price, condition in conditions.items():
df.loc[condition, 'price'] = df.loc[condition, 'price'].fillna(price)
If you want to further restrict by df.object
, you can add the df.object
condition with &
:
conditions = {
30: df.object.eq('laptop') & df.id.str.contains('[^7][ABC]$'),
50: df.object.eq('laptop') & df.id.str.contains('7[ABC]$'),
20: df.object.eq('laptop') & df.id.str.contains('[EFG]$'),
10: df.object.eq('laptop') & df.id.str.contains('[OMN]$'),
1000: df.object.eq('phone') & df.id.str.contains('[OMN]$'),
}
If you really want to use a function, you can apply
along rows (axis=1
), but row-apply is much slower and not advised when you have vectorized options like np.select
:
def price(row):
result = np.nan
if row.object == 'laptop':
if row.id[-2:] in ['7A', '7B', '7C']:
result = 50
elif row.id[-1] in list('ABC'):
result = 30
elif row.id[-1] in list('EFG'):
result = 20
elif row.id[-1] in list('OMN'):
result = 10
elif row.object == 'phone':
if row.id[-2:] in ['7A', '7B', '7C']:
result = 5000
...
return result
df.price = df.apply(price, axis=1)