Search code examples
pandasdataframeloopsduplicates

When looping though a dataframe looking for string values. How do I print out rows without duplicating if the search term is found multiple times?


I am importing pandas and reading a csv file.

import pandas as pd

df = pd.read_csv(r'C:\Users\Me\MyFile.csv')

I am converting the dataframe to strings and making the contents lowercase

df_low = df.apply(lambda x: x.astype(str).str.lower() if (x.dytpe == 'object'_ else x)

I'm using user input to search for strings

searchterm = str(input("Enter search term: ").lower())

looping through the dataframe and creating mask, then only returning non-empty results. I'm currently using the .upper command only due to the 'Item Number' having 3 capital letters in the beginning but, of course this is making the entire output capital but that is workable.

for item in df_low:
    mask = df_low[item].str.contains(searchterm)
    df_mask = pd.DataFrame(df_low[mask])
    if not df_mask.empty:
        print(df_mask[['Item Number', 'User Name', 'Short Description']].to_string(index=false).upper())

if df contains the following:

Item Number User Name Long Description Short Description Comments
ITM-1 Leonardo A bunch of words that contain a bunch of stuff words and stuff I wrote words
ITM-2 Donatello Man, this guy didn't even put this in the correct format but at least he wrote words. guy and words N/A
ITM-3 Frank Windows, SQL, RedHat, RHEL, Garbage Technology list Technology stack

Searching for the term 'technology' Expected Output:

Item Number User Name Short Description
ITM-3 FRANK TECHNOLOGY LIST

Due to the loop, the actual output would be the following since the search term is found in both the 'Short Description and 'Comments' fields:

Item Number User Name Short Description
ITM-3 FRANK TECHNOLOGY LIST
Item Number User Name Short Description
ITM-3 FRANK TECHNOLOGY LIST

I tried multiple variations of .drop_duplicates but I know enough to know I don't know how to use that because I have a limited understanding where that would go in the overall code.


Solution

  • Instead of a loop, you can use applymap to filter rows that have the searchterm in any column:

    searchterm = "Technology"
    
    out = df[
        df.applymap(lambda x: isinstance(x, str) and searchterm.lower() in x.lower()).any(
            axis=1
        )
    ]
    out = out[["Item Number", "User Name", "Short Description"]]
    
      Item Number User Name Short Description
    2       ITM-3     Frank   Technology list