I have a dataframe df which contains movies data.
.
I want to create a new column in df called "drama_movie" which contains the value True for the movies that are Dramas and False for if they are not.
I tried it with following code:
df["drama_movie"]=df['listed_in'].isin(["Dramas"])
-> but I receive everything as False in the column drama_movie.
When I try the following code:
df["drama_movie"]=df.apply(lambda x: x['listed_in'] in x['Dramas'], axis=1)
-> I receive a key error "Dramas"
What works is this code:
df["drama_movie"] = df['listed_in'].str.contains('Dramas', case=False, na=False)
-> But I need to use pythons in operator. I'm somehow stuck with it. Any suggestions? Thank you for your help
You can split strings then explode lists then keep only rows that match your criteria:
drama_movies = (df.loc[df['listed_in'].str.split(',').explode()
.loc[lambda x: x.isin(['Dramas'])].index])
Don't use apply
here or use a comprehension:
drama_movies = df[['Dramas' in s.split(',') for s in df['listed_in']]]
# For 200 rows
# apply: 1.16 ms ± 20.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
# comprehension: 156 µs ± 262 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)