Search code examples
pythonmatrixcombinationspython-itertools

pairwise combinations to matrix/table representation


I think this is probably something people have already solved, and may even be some baked in functionality I'm missing, so I figured I'd ask before I reinvent the wheel.

Basically, given some pairwise output from itertools.combinations, I'd like to represent it as a matrix/table of each comparison.

So far, I'm roughly up to this point:

from itertools import combinations

def chunks(l, n):
    n = max(1, n)
    return [l[i:i+n] for i in range(0, len(l), n)]
    
x = [("A", 1), ("B", 2), ("C", 3), ("D", 4), ("E", 5)]

[print(i) for i in chunks([i[1]+j[1] for i, j in combinations(x, 2)], len(x)-1)]

This gives me a matrix-style output:

[3, 4, 5, 6]
[5, 6, 7, 7]
[8, 9]
[None, None, None]

I'm not sure where the Nones are coming from just yet as the output of chunks([i[1]+j[1] for i, j in combinations(x, 2)], len(x)-1) is:

[[3, 4, 5, 6], [5, 6, 7, 7], [8, 9]]

But I can look into that later on (but feel free to point out my obvious mistake!)

I'd ideally like to end up with a pairwise matrix (ideally with the names of the comparisons attached too so it would appear something like:

    A    B    C    D    E
A        3    4    5    6
B             5    6    7
C                  7    8
D                       9
E   

It clear my naive approach of chunking by the length of the input data isn't quite right either as the 7 belonging to the C+D comparison is on the wrong line. I'd forgotten to account for the additional entry disappearing each time.

If there's a better way altogether, I'm happy to change the approach. I've focussed on using itertools for this as it may end up being run over large files with potentially thousands of comparisons in a bigger script with other calculations etc happening, so avoiding self, and repeat comparisons is ideal.

Edit:

To add, I'd like to subsequently be able to output the matrix that I depicted, with the row and column names, to a tsv/csv or similar.


Solution

  • This creates a matrix as you describe, using 0's for the "blanks":

    [[(a[1]+b[1] if a[0]<b[0] else 0) for b in x] for a in x]
    

    To print it out:

    print("\t".join(['']+[a[0] for a in x]))
    for a in x:
        print("\t".join([a[0]] + [(str(a[1]+b[1]) if a[0]<b[0] else '') for b in x]))