Search code examples
pythonpandasapply

Apply a function on two pandas tables


I have the following two tables:

>>> df1 = pd.DataFrame(data={'1': ['john', '10', 'john'],
...                         '2': ['mike', '30', 'ana'],
...                         '3': ['ana', '20', 'mike'],
...                         '4': ['eve', 'eve', 'eve'],
...                         '5': ['10', np.NaN, '10'],
...                         '6': [np.NaN, np.NaN, '20']},
...                   index=pd.Series(['ind1', 'ind2', 'ind3'], name='index'))
>>> df1
        1     2     3    4    5    6
index
ind1   john  mike   ana  eve   10  NaN
ind2     10    30    20  eve  NaN  NaN
ind3   john   ana  mike  eve   10   20


df2 = pd.DataFrame(data={'first_n': [4, 4, 3]},
                   index=pd.Series(['ind1', 'ind2', 'ind3'], name='index'))
>>> df2
    first_n
index
ind1         4
ind2         4
ind3         3

I also have the following function that reverses a list and gets the first n non-NA elements:

def get_rev_first_n(row, top_n):
    rev_row = [x for x in row[::-1] if x == x]
    return rev_row[:top_n]

>>> get_rev_first_n(['john', 'mike', 'ana', 'eve', '10', np.NaN], 4)
['10', 'eve', 'ana', 'mike']

How would I apply this function to the two tables so that it takes in both df1 and df2 and outputs either a list or columns?


Solution

  • df=pd.concat([df1,df2],axis=1)
    df.apply(get_rev_first_n,args=[4])  #send args as top_in
    

    axis=0 is run along rows means runs on each column which is the default you don't have to specify it

    args=[4] will be passed to second argument of get_rev_first_n