Search code examples
pythonpandasstringjoinmerge

Pandas - Create new column w/values from another column based on str contains


I have two DataFrames. One with multiple columns and other with just one. So what I need is to join based on partial str of a column.

Example:

df1

Name Classification
A Transport/Bicycle/Mountain
B Transport/City/Bus
C Transport/Taxi/City
D Transport/City/Uber
E Transport/Mountain/Jeep

df2

Category
Mountain
City

As you can see the order on Classification column is not well defined.

Desired output:

Name Classification Category
A Transport/Bicycle/Mountain Mountain
B Transport/City/Bus City
C Transport/Taxi/City City
D Transport/City/Uber City
E Transport/Mountain/Jeep Mountain

I'm stuck at this. Any ideas?

Many thanks in advance.


Solution

  • This implementation does the trick:

    def get_cat(val):
        for cat in df2['Category']:
            if cat in val:
                return cat
        return None
    
    df['Category'] = df['Classification'].apply(get_cat)
    

    Note: as @Justin Ezequiel pointed out in the comments, you haven't specified what to do when Mountain and City exists in the Classification. Current implementation uses the first Category that matches.