Search code examples
pythonlistdataframetuplessimilarity

Create a list of tuples from a list of m items and an m x m array of similarities


I have a list of 3 Items.

Items_list = ['a','b','c']

sklearn cosinesimilarities function gives me an output of 3 x 3 matrix for all the combinations of items 'a','b' and 'c' as follows:

similarities =[[1, 0.5, 0.2],
               [0.5, 1, 0.6],
               [0.2, 0.6, 1]]

I want to create a Pandas DataFrame with two columns as follows: Required Output:

  Col1                          Col2
0    a  [(a, 1), (b, 0.5), (c, 0.2)]
1    b  [(a, 0.5), (b, 1), (c, 0.6)]
2    c  [(a, 0.2), (b, 0.6), (c, 1)]

Solution

  • Hope that's what you need

    import pandas as pd
    
    item_list = ['a','b','c']
    
    similarities =[[1, 0.5, 0.2],
                   [0.5, 1, 0.6],
                   [0.2, 0.6, 1]]
    
    tuple_similarities = [list(zip(item_list, row)) for row in similarities]
    
    df = pd.DataFrame({'Col1': item_list,
                       'Col2': tuple_similarities})
    
    print(df)
    

    Output:

      Col1                          Col2
    0    a  [(a, 1), (b, 0.5), (c, 0.2)]
    1    b  [(a, 0.5), (b, 1), (c, 0.6)]
    2    c  [(a, 0.2), (b, 0.6), (c, 1)]