Search code examples
pythonpandasstringdataframesubstring

Pandas filter dataframe columns through substring match


I have a dataframe with multiple columns, eg:

     Name  Age   Fname
0    Alex   10   Alice
1     Bob   12     Bob
2  Clarke   13  clarke

My filter condition is to check if Name is (case-insensitive) substring of corresponding Fname.

If it was equality, something as simple as:

df[df["Name"].str.lower() == df["Fname"].str.lower()]

works. However, I want substring match, so instead of ==, I thought in would work. But that gives error as it interprets one of the arguments as pd.Series. My 1st question is Why this difference in interpretation?

Another way I tried was using .str.contains:

df[df["Fname"].str.contains(df["Name"], case=False)]

which also interprets df["Name"] as pd.Series, and of course, works for some const string in the argument.

eg. this works:
df[df["Fname"].str.contains("a", case=False)]

I want to resolve this situation, so any help in that regard is appreciated.


Solution

  • You can iterate over index axis:

    >>> df[df.apply(lambda x: x['Name'].lower() in x['Fname'].lower(), axis=1)]
    
         Name  Age   Fname
    1     Bob   12     Bob
    2  Clarke   13  clarke
    

    str.contains takes a constant in first argument pat not a Series.