I have a dataframe with multiple columns, eg:
Name Age Fname
0 Alex 10 Alice
1 Bob 12 Bob
2 Clarke 13 clarke
My filter condition is to check if Name
is (case-insensitive) substring of corresponding Fname
.
If it was equality, something as simple as:
df[df["Name"].str.lower() == df["Fname"].str.lower()]
works. However, I want substring match, so instead of ==
, I thought in
would work. But that gives error as it interprets one of the arguments as pd.Series
. My 1st question is Why this difference in interpretation?
Another way I tried was using .str.contains
:
df[df["Fname"].str.contains(df["Name"], case=False)]
which also interprets df["Name"]
as pd.Series
, and of course, works for some const string in the argument.
eg. this works:
df[df["Fname"].str.contains("a", case=False)]
I want to resolve this situation, so any help in that regard is appreciated.
You can iterate over index axis:
>>> df[df.apply(lambda x: x['Name'].lower() in x['Fname'].lower(), axis=1)]
Name Age Fname
1 Bob 12 Bob
2 Clarke 13 clarke
str.contains
takes a constant in first argument pat
not a Series
.