I have this dataframe which need to extract package info (ML, KG, PZA, LT, UN, etc) from description column, and i'm pretty new at pandas. This is the dataframe right now
SKU | Description |
---|---|
1 | TRIDENT 6S SANDIA 9GR |
2 | CANAST RABBIT F1 A 1UN |
3 | HAND SOAP VITAMIN E 442 ML. |
I need to extract 9GR, 1UN, 442 ML, etc. and take it into another column. I need to extract what matches within a list of possible values that are going to be part of the accepted Package series so this a re the possible values
[GR, UN, LT, OZ]
Anything that matches in the description column this substrings i need to replace in the column Package and remove it from description column.
You can use this regex:
pkg = ['ML', 'KG', 'PZA', 'LT', 'UN', 'GR']
df['package'] = df['Description'].str.extract(fr"\b(\d+\s*(?:{'|'.join(pkg)}))\b")
print(df)
# Output
SKU Description package
0 1 TRIDENT 6S SANDIA 9GR 9GR
1 2 CANAST RABBIT F1 A 1UN 1UN
2 3 HAND SOAP VITAMIN E 442 ML. 442 ML