Search code examples
pythonpandasappendlookup

Unable to write lookup output to Pandas dataframe using append loop


I want to lookup column in dataframe (Lookup item) using pandas (or anything else). If the value is found in that column then corresponding value in another column (Corresponding item) of same dataframe is picked up with both recorded into another dataframe.

Example cols are:

  1. lookup_id = [111, 222, 333, 444, 777 , 1089 , 3562 ]
  2. id_number = [111, 23, 333, 444, 10101 ,777 , 222 ]
items = arr.array('i', [111, 222, 333, 444, 777 , 1089 , 3562 ])
new_df = []

for item in items:
    lookup_id = item
    
    id_number = get_score_by_id(df_past, lookup_id)
    if id_number is not None:
       df_append = (f'{lookup_id}, {id_number}')

    else:
       df_append = ('NAN')

print(df_append)

The required output dataframe is shown below:

Lookup item Corresponding item
111 OK
222 OK
333 OK
444 OK
777 OK
1089 NAN
3562 NAN

So here all are found so 'OK' returned and 1089 and 3562 do not exist so NAN recored.

The script has been developed to replace an xls VLOOKUP so it's a vlookup/append but cant get the append to add new fields in next row fresh df.

I can get it to work just printing the output but want the new_df populated and that’s primarily intent of question.

Thanks.


Solution

  • If you need only OK, NAN then you could use .isin() to get True/False
    and later you can convert it to strings.

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'lookup_id':[111, 222, 333, 444, 777, 1089, 3562]})
    df_past = pd.DataFrame({'id_number': [111, 23, 333, 444, 10101, 777, 222]})
    
    df['result'] = df['lookup_id'].isin(df_past['id_number'])
    df['result'] = np.where(df['result'], 'OK', 'NAN')
    
    print(df)
    

    Result:

       lookup_id result
    0        111     OK
    1        222     OK
    2        333     OK
    3        444     OK
    4        777     OK
    5       1089    NAN
    6       3562    NAN
    

    If you need more information from other dataframe then you can use .apply(function) and inside function get more data and format it

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'lookup_id':[111, 222, 333, 444, 777, 1089, 3562]})
    df_past = pd.DataFrame({'id_number': [111, 23, 333, 444, 10101, 777, 222]})
    
    df['result'] = df['lookup_id'].isin(df_past['id_number'])
    
    def function(lookup_id):
        #print('lookup_id:', lookup_id)
    
        row = df_past[ df_past['id_number'] == lookup_id ]
        #print(row)
    
        if not row.empty:
            return f'{lookup_id}, {row.index[0]}'
           
        return 'NAN'
        
    df['result'] = df['lookup_id'].apply(function)
    
    print(df)
    

    Result:

       lookup_id  result
    0        111  111, 0
    1        222  222, 6
    2        333  333, 2
    3        444  444, 3
    4        777  777, 5
    5       1089     NAN
    6       3562     NAN