I have a table with same product with different colors. The quantity is different for each. However, the format is so different for the product.
PROD QUAN
product abcdefg (white) 20
product abcdefg blue color 30
product abcdefg - pink 20
product zy 80
I want to -
The final output expected is
PROD QUAN
product abcdefg (white) 20
product zy 80
Match atleast 80%-90% of the text in PROD and
How about 70%?
Try:
import pandas as pd
df = pd.DataFrame({'PROD': {0: 'product abcdefg (white)',
1: 'product abcdefg blue color',
2: 'product abcdefg - pink',
3: 'product zy'},
'QUAN': {0: 20, 1: 30, 2: 20, 3: 80}})
# slice at (say 70%)...
cutOff = round((len(df.loc[0,'PROD'])/100) * 70)
df['PROD2'] = df['PROD'].str.slice(stop=cutOff)
df = df[df['QUAN'] == df.groupby('PROD2')['QUAN'].transform('min')]
df = df.drop_duplicates(subset=['PROD2','QUAN'], keep="first").reset_index(drop=True)
del df['PROD2']
print(df)
PROD QUAN
0 product abcdefg (white) 20
1 product zy 80