Search code examples
pythonpandasdataframedata-science

How to use pythons in operator in a dataframe to search for a string and return boolean in a new column in the same dataframe


I have a dataframe df which contains movies data. enter image description here.

I want to create a new column in df called "drama_movie" which contains the value True for the movies that are Dramas and False for if they are not.

I tried it with following code: df["drama_movie"]=df['listed_in'].isin(["Dramas"])

-> but I receive everything as False in the column drama_movie.

When I try the following code: df["drama_movie"]=df.apply(lambda x: x['listed_in'] in x['Dramas'], axis=1)

-> I receive a key error "Dramas"

What works is this code: df["drama_movie"] = df['listed_in'].str.contains('Dramas', case=False, na=False)

-> But I need to use pythons in operator. I'm somehow stuck with it. Any suggestions? Thank you for your help


Solution

  • You can split strings then explode lists then keep only rows that match your criteria:

    drama_movies = (df.loc[df['listed_in'].str.split(',').explode()
                                          .loc[lambda x: x.isin(['Dramas'])].index])
    

    Don't use apply here or use a comprehension:

    drama_movies = df[['Dramas' in s.split(',') for s in df['listed_in']]]
    
    # For 200 rows
    # apply: 1.16 ms ± 20.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
    # comprehension: 156 µs ± 262 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)