I have two DataFrames. One with multiple columns and other with just one. So what I need is to join based on partial str of a column.
Example:
df1
Name | Classification |
---|---|
A | Transport/Bicycle/Mountain |
B | Transport/City/Bus |
C | Transport/Taxi/City |
D | Transport/City/Uber |
E | Transport/Mountain/Jeep |
df2
Category |
---|
Mountain |
City |
As you can see the order on Classification
column is not well defined.
Desired output:
Name | Classification | Category |
---|---|---|
A | Transport/Bicycle/Mountain | Mountain |
B | Transport/City/Bus | City |
C | Transport/Taxi/City | City |
D | Transport/City/Uber | City |
E | Transport/Mountain/Jeep | Mountain |
I'm stuck at this. Any ideas?
Many thanks in advance.
This implementation does the trick:
def get_cat(val):
for cat in df2['Category']:
if cat in val:
return cat
return None
df['Category'] = df['Classification'].apply(get_cat)
Note: as @Justin Ezequiel pointed out in the comments, you haven't specified what to do when Mountain and City exists in the Classification. Current implementation uses the first Category that matches.