Vlook Up Elements from another dataframe for creating MultiIndex DataFrame in Python

There are 2 Dataframes and one hierarchical index (pandas multiIndex). Dataframe A has a list of ID's and Names. Dataframe B has a list of Name combinations and a similarity score.

I want, based on the multiindex to pick up the values from Dataframe A and check if the combination does exist in DataFrame B. If yes, I want to bring to my Multindex dataframe the similiarity score, otherwise just 0.

DataFrame A (Original Dataframe)

test= pd.DataFrame({'row':['a','b','c','d'],'col_A' : ["Alexis","Alexi","Peter","Pete"]})

test = test.set_index('row');test
Out:   
    row   col_A   
    a     Alexis
    b     Alexi
    c     Peter
    d     Pete

DataFrame B (Name Similarity)

names = pd.DataFrame({'A' : ["Alexis","Alexi","Peter","Pete"]
                    ,'B' : ["Alexi","Alexis","Pete","Peter"]
                    , "similarity" : [0.9,0.9,0.8,0.8]})
Out:
       A       B      similarity
0   Alexis   Alexi         0.9
1   Alexi    Alexis        0.9
2   Peter    Pete          0.8
3   Pete     Peter         0.8

Multiindex

# Creating a Pandas MultiIndex 
arrays = [['a', 'a', 'a', 'b', 'b', 'c'],
          ['b', 'c', 'd', 'c', 'd', 'd']]

tuples = list(zip(*arrays))
indexy = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])

to iterate through the index I use the below function, however, I am not sure how to adjust it, in order to bring the similarity score when exists or 0 when it does not.

a = pd.DataFrame((test.reindex(indexy.get_level_values(0)).values (?) test.reindex(indexy.get_level_values(1))).values,index=indexy,columns=test.columns)

Where a I want to look like this:

   row      similarity
first second           
a     b             0.9
      c             0
      d             0
b     c             0
      d             0
c     d             0.8

Solution

So if you are not too much into the multiindex, here is one way to get your data as you expect:

import pandas as pd
test= pd.DataFrame({'row':['a','b','c','d'],'col_A' : ["Alexis","Alexi","Peter","Pete"]})
names = pd.DataFrame({'A' : ["Alexis","Alexi","Peter","Pete"],
                    'B' : ["Alexi","Alexis","Pete","Peter"],
                    "similarity" : [0.9,0.9,0.8,0.8]})

Note I don't set_index test but you can do it, it would slightly change the following (see the comments). You can create a dataframe a such as:

import itertools
a = pd.DataFrame([p for p in itertools.combinations(test['col_A'], 2)],columns =['A','B'],
                 index=['%s,%s'%p for p in itertools.combinations(test['row'], r=2)])
# here if you did set_index your test, then replace 
# index=['%s,%s'%p for p in itertools.combinations(test['row'], r=2)] by 
# index=['%s,%s'%p for p in itertools.combinations(test.index, r=2)]

And it's look like:

          A      B
a,b  Alexis  Alexi
a,c  Alexis  Peter
a,d  Alexis   Pete
b,c   Alexi  Peter
b,d   Alexi   Pete
c,d   Peter   Pete

Then you can use reset_index (to get current index as a column but it depends of what you want exactly) merge with names on columns A and B, fill nan with 0, drop the two columns A and B, and rename (if necessary):

a = a.reset_index().merge(names, how = 'left', on = ['A','B']).fillna(0).\
     drop(labels = ['A','B'], axis=1).rename(columns = {'index':'row', 'similarity':'col_A'})

Let me know if you can do what you want after

EDIT: with the new output you look for, you can do:

a = pd.DataFrame([p for p in itertools.combinations(test['col_A'], 2)],columns =['A','B'],
                 index=pd.MultiIndex.from_tuples([p for p in itertools.combinations(test.index, r=2)], names=['first', 'second']))

Note: itertools generate tuples that are used in pd.MultiIndex.from_tuples to define your multiindex DF.

Now you can merge (to keep the multiindex, you need to reset_index before and set_index after:

a = a.reset_index().merge(names, how = 'left', on = ['A','B']).fillna(0).\
         drop(labels = ['A','B'], axis=1).set_index(['first', 'second'])