Search code examples
pythonpandasdataframexlsx

I need help on searching a list of values on pandas Dataframe, and then separating the row the value is found


What im looking to achieve: I was wondering what would be the best way to find any value of a list in a pandas Dataframe, and then to separate the row where the value is found.

Here is what i Tried

def searchXlsx(xlsx_file: str, numbers: list) -> list:
    
    with open(os.path.realpath(xlsx_file), "rb") as i:
        i = pd.read_excel(i)
        print(i[i.isin(numbers)])

i would like to return the row it is located


Solution

  • You can use .isin:

    import numpy as np
    import pandas as pd
    
    searchvals = np.array([1,45,87,9]) #The values to find
    
    #Create a random dataframe of integers with 100k rows and 10 columns
    size = (100000, 10)
    columns = [f"col_{x}" for x in range(size[1])]
    df = pd.DataFrame(data=np.random.randint(low=0, high=1000, size=size), columns=columns)
    
    matches = df.isin(searchvals).any(axis=1) #A series of True/False where each value correspond to a row 
    
    df2 = df.loc[matches] #Extract the matching rows into a new dataframe
    df2.head()
    #       col_0  col_1  col_2  col_3  col_4  col_5  col_6  col_7  col_8  col_9
    # 34       793    893    435    312    812     87    990    552    496    397
    # 47       818    902    478    479    736     83    135     45    180    224
    # 55         9    750    118    910    205    866    617    232    506    755
    # 94       326    303    868      9    583    310    668    614    720     50
    # 99       167    148    586    570    196    473    829     87    266    194