Search code examples
pythonpandasdataframedata-analysis

Pandas: Changing the value of a column based on a string existing in the column


I have a list of movies and I want to change the value of the column to 0 if the string "Action" exists in the column or 1 if the string "Drama" exists. If both exists then change the value to 0 since the genre "Action" is more important.

For example lets say I have the table below:

Genres
Action Comedy Adventure
Drama Crime Horror
Action Drama Adventure

I want it to change to this:

Genres
0
1
0

Any help will be greatly appreciated! Thank you!


Solution

  • Use numpy.select, if not match both condition is set NaN by parameter default:

    #if test substrings
    m1 = df['Genres'].str.contains('Drama')
    m2 = df['Genres'].str.contains('Action')
    
    #if test lists
    m1 = ['Drama' in x for x in df['Genres']]
    m2 = ['Action' in x for x in df['Genres']]
    
    df['Genres'] = np.select([(m1 & m2) | m2, m1], [0, 1], default=np.nan)