create a symmetric matrix from a pairwise list python for clustering scikit, DBSCAN

My goal is to perform clustering using DBSCAN from scikit with a precomputed similarity matrix. I have a list with features. I do a pairwise to generate unique pairs for the list and have a function that calculates similarity between pairs. Now I want to transform it to a symmetric matrix that can be used as an input for the clustering algorithm. I think groupby may be helpful, but I am not sure how to go about it. Here is a sample code that gives a list of pairs with distance measure.The id field in the original list is the unique row identifier.

def add_similarity(listdict):
    random.seed(10)
    newlistdist=[]
    for tup_dict in listdict:
        newdict={}
        tup0=tup_dict[0]
        tup1=tup_dict[1]
        for key,value in tup0.items():
            newdict[key +"_1"]=value
        for key,value in tup1.items():
            newdict[key+"_2"]=value 
        newdict["similarity"]=random.random()      
        newlistdist.append(newdict)                   
    return newlistdist


def generatesymm():
    listdict =[{'feature1': 4, 'feature2':2,"id": 100},{'feature1': 3, 'feature2': 2,"id":200},{'feature1': 4, 'feature2':2,"id": 300}]
    pairs=list(itertools.combinations(listdict, 2) )
    newlistdict=add_similarity(pairs)

If I run this code this gives

    [{'id_2': 200, 'feature1_2': 3, 'feature2_2': 2, 'feature2_1': 2, 'feature1_1': 4, 'similarity': 0.571, 'id_1': 100},     


{'id_2': 300, 'feature1_2': 4, 'feature2_2': 2, 'feature2_1': 2, 'feature1_1': 4, 'similarity': 0.42, 'id_1': 100},   


{'id_2': 300, 'feature1_2': 4, 'feature2_2': 2, 'feature2_1': 2, 'feature1_1': 3, 'similarity': 0.578, 'id_1': 200}]

The output I need

          100       200       300


100        1         0.571      0.42  


200        0.571      1          0.578


300        0.428      0.578       1

Solution

It is not clear to me where id_3 comes from, but below is one way to make your dataframe. The trick is to use numpy to index into the upper and lower triangular portions of the matrix.

In [679]:
import numpy as np
import pandas as pd
similarities = [x["similarity"] for x in newlistdict]
names = ['id_'+str(x) for x in range(1,4)]
n = len(similarities)
iuu = np.mask_indices(3, np.triu, 1)
iul = np.mask_indices(3, np.tril, -1)
mat = np.eye(n)
mat[iuu] = similarities
mat[iul] = similarities
df = pd.DataFrame(mat,columns=names)
df.index = names
df

Out[679]:
        id_1        id_2        id_3
id_1    1.000000    0.896082    0.897818
id_2    0.896082    1.000000    0.186298
id_3    0.897818    0.186298    1.000000

(The values differ from your question because I don't know the random seed you used.)