Python: how to count collaborations between pairs in pandas dataframe?

I have a dataframe like this

df = pd.DataFrame( {'Item':['A','A','A','B','B','C','C','C','C'], 
'Name':[Tom,John,Paul,Tom,Frank,Tom, John, Richard, James],
 'Weight:[2,2,2,3,3,5, 5, 5, 5]'})
df 
Item Name  Weight
A    Tom     4
A    John    4
A    Paul    4
B    Tom     3
B    Frank   3
C    Tom     5
C    John    5
C    Richard 5
C    James   5

For each people I want the list of the people with same item averaged over the weight

df1 
Name              People                          Times
Tom     [John, Paul, Frank, Richard, James]       [(1/4+1/5),1/4,1/3,1/5,1/5]
John    [Tom, Richard, James]                     [(1/4+1/5),1/5,1/5]
Paul    [Tom, John]                               [1/4,1/4]
Frank   [Tom]                                     [1/3]
Richard [Tom, John, James]                        [1/5,1/5,1/5]
James   [Tom, John, Richard]                      [1/5,1/5,1/5]

In order to count the times of collaboration without considering the weight, I did:

#merge M:N by column Item
df1 = pd.merge(df, df, on=['Item'])

#remove duplicity - column Name_x == Name_y
df1 = df1[~(df1['Name_x'] == df1['Name_y'])]
#print df1

#create lists
df1 = df1.groupby('Name_x')['Name_y'].apply(lambda x: x.tolist()).reset_index()
print df1
    Name_x                                     Name_y
0    Frank                                      [Tom]
1    James                       [Tom, John, Richard]
2     John           [Tom, Paul, Tom, Richard, James]
3     Paul                                [Tom, John]
4  Richard                         [Tom, John, James]
5      Tom  [John, Paul, Frank, John, Richard, James]


#get count by np.unique
df1['People'] = df1['Name_y'].apply(lambda a: np.unique((a), return_counts =True)[0])
df1['times'] = df1['Name_y'].apply(lambda a: np.unique((a), return_counts =True)[1])
#remove column Name_y
df1 = df1.drop('Name_y', axis=1).rename(columns={'Name_x':'Name'})
print df1
      Name                               People            times
0    Frank                                [Tom]              [1]
1    James                 [John, Richard, Tom]        [1, 1, 1]
2     John          [James, Paul, Richard, Tom]     [1, 1, 1, 2]
3     Paul                          [John, Tom]           [1, 1]
4  Richard                   [James, John, Tom]        [1, 1, 1]
5      Tom  [Frank, James, John, Paul, Richard]  [1, 1, 2, 1, 1]

In the last dataframe I have the count of collaboration between all the pairs, however I would like their weighted counting of collaboration

Solution

Starting with:

df = pd.DataFrame({'Item': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C', 'C'],
                   'Name': ['Tom', 'John', 'Paul', 'Tom', 'Frank', 'Tom', 'John', 'Richard', 'James'],
                   'Weight': [2, 2, 2, 3, 3, 5, 5, 5, 5]})

df1 = pd.merge(df, df, on=['Item'])
df1 = df1[~(df1['Name_x'] == df1['Name_y'])].set_index(['Name_x', 'Name_y']).drop(['Item', 'Weight_y'], axis=1)

You could use .apply() to create the values and .unstack() for the wide format:

collab = df1.groupby(level=['Name_x', 'Name_y']).apply(lambda x: np.sum(1/x)).unstack().loc[:, 'Weight_x']

Name_y      Frank  James  John  Paul  Richard       Tom
Name_x                                                 
Frank         NaN    NaN   NaN   NaN      NaN  0.333333
James         NaN    NaN   0.2   NaN      0.2  0.200000
John          NaN    0.2   NaN   0.5      0.2  0.700000
Paul          NaN    NaN   0.5   NaN      NaN  0.500000
Richard       NaN    0.2   0.2   NaN      NaN  0.200000
Tom      0.333333    0.2   0.7   0.5      0.2       NaN

then iterate over the rows and convert to lists:

df = pd.DataFrame(columns=['People', 'Times'])
for p, data in collab.iterrows():
    s = data.dropna()
    df.loc[p] = [s.index.tolist(), s.values]

                                      People  \
Frank                                  [Tom]   
James                   [John, Richard, Tom]   
John             [James, Paul, Richard, Tom]   
Paul                             [John, Tom]   
Richard                   [James, John, Tom]   
Tom      [Frank, James, John, Paul, Richard]   

                                        Times  
Frank                        [0.333333333333]  
James                         [0.2, 0.2, 0.2]  
John                     [0.2, 0.5, 0.2, 0.7]  
Paul                               [0.5, 0.5]  
Richard                       [0.2, 0.2, 0.2]  
Tom      [0.333333333333, 0.2, 0.7, 0.5, 0.2]