Search code examples
pythondataframemergeconcatenationcosine-similarity

How to combine 2 dataframes with applying function on the values of the same position


I'm trying to combine 2 dataframes with applying function to values in the same position of 2 dataframes.

Each element in 2 dataframes is list type represented a vector of item[col,row].

df1 :

   A      B   
0  vec1   vec2      
1  vec1   vec2      
2  vec1   vec2   

df2 :

   A      B         
0  vec5   vec5     
1  vec6   vec6    
2  vec7   vec7  

function : gensim.matutils.cossim(vec1,vec2)

Expected new_df :
   A                   B
0  cossim(vec1,vec5)   cossim(vec2,vec5)   
1  cossim(vec1,vec6)   cossim(vec2,vec6)   
2  cossim(vec1,vec7)   cossim(vec2,vec7)

Following code was implemented by me:

for column in df1():
    new_df[column] = df1[column].apply(matutils.cossim(df1[x],df2.loc[0,column]))

I am getting the error as:

AttributeError: 'list' object has no attribute 'sqrt'


Solution

  • You can define your own function and apply it and change it by numpy.vectorize.

    import numpy as np
    import pandas as pd
    
    from sklearn.metrics.pairwise import cosine_similarity
    
    
    X = pd.DataFrame([[[0.1,0.1], [0.2,0.2]], [[0.3,0.3], [0.4,0.4]]])
    Y = pd.DataFrame([[[0.1,0.1], [0.2,0.2]], [[0.3,0.3], [0.4,0.4]]])
    
    def func(vecx, vecy):
        return cosine_similarity(vecx, vecy)
    
    F = np.vectorize(func)         
    
    print(pd.DataFrame(F(X, Y)))
    

    you will get

         0    1
    0  1.0  1.0
    1  1.0  1.0