Search code examples
pythonexceldataframeanacondalookup

Lookup 3 or more matching numbers in dataframe


I'm using python and having trouble finding a code that matches 3 or more numbers and return the Id number. For example, I would like to lookup these 6 numbers in one sheet ( 2,4,6,8,9,10 )

enter image description here

lookup 6 number above and find 3 or more numbers in this sheet below.enter image description here

I would like my result to be something like this in another excel sheet but only return the ID only if it matches 3 or more numbers. If you look at id u it matches 2,4,6,8 and id 6 matches 6,8,9,10 and id 7 matches 2,6,8

enter image description here


Solution

  • Here is what I think you are trying to achieve. I have made the assumption that you have imported both excel files and have named them df1 and df2 respectively. With df1 being the smaller dataframe containing the values to search. I created my own dataframes for testing purposes. I only recreated df2 up to ID=7.

    Here is the code to build the tables:

    import pandas as pd
    
    cols = ['Num1','Num2','Num3','Num4','Num5','Num6']
    df1 = pd.DataFrame([[2,4,6,8,9,10]], columns=cols)
    df2 = pd.DataFrame([[1,1,2,4,5,6,8],
                       [2,5,6,20,22,23,34],
                       [3,8,12,13,34,45,46],
                       [4,9,10,14,29,32,33],
                       [5,1,22,13,23,33,35],
                       [6,1,6,7,8,9,10],
                       [7,0,2,3,5,6,8]], 
                       columns = ['Id', 'Num1','Num2','Num3','Num4','Num5','Num6'])
    

    And now the code to find the IDs:

    # convert the values in the first dataframe to a list
    vals_to_find = df1.iloc[0].tolist()
    
    # Print the values to find
    print("Vals to find:", vals_to_find)
    
    # Create an empty list to hold the matching IDs
    matching_ids = []
    
    # iterate through the big dataframe
    for index, row in df2.iterrows():
    
        rowlist = row.tolist()       # convert the row to a list
    
        # keep the id for later, and extract the other values for evaluation
        id = rowlist[0]
        vals = rowlist[1:]
        
        # count the number of values in one list against another list
        counter = sum(elem in vals_to_find for elem in vals)
    
        # If the number of matches is greater than 2, then grab the ID
        if counter > 2:
            matching_ids.append({'ID': id})
    
    # Print the matching IDs 
    print('Matching IDS:', matching_ids)
    
    # Convert to dataframe and export to excel spreadsheet
    df3 = pd.DataFrame(matching_ids)
    df3.to_excel("output.xlsx", index=False)
    

    OUTPUT:

    Vals to find: [2, 4, 6, 8, 9, 10]
    Matching IDS: [{'ID': 1}, {'ID': 6}, {'ID': 7}]
    

    And the output.xlsx:

    enter image description here