Search code examples
pythonnumpycounterpython-itertools

calculate individual frequencies for both elements of a pair of this 2 elements from a numpy matrix in python


Working with this numpy matrix:

>>> print matrix
[['L' 'G' 'T' 'G' 'A' 'P' 'V' 'I']
 ['A' 'A' 'S' 'G' 'P' 'S' 'S' 'G']
 ['A' 'A' 'S' 'G' 'P' 'S' 'S' 'G']
 ['G' 'L' 'T' 'G' 'A' 'P' 'V' 'I']]

I already have this piece of code:

for i, j in itertools.combinations(range(len(matrix.T)), 2):
    c = matrix[:, [i,j]]
    counts = collections.Counter(map(tuple,c))
    print 'columns {} and {}'.format(i,j)
    for AB in counts:
      freq_AB = float(float(counts[AB])/len(c))
      print 'Frequency of {} = {}'.format(AB, freq_AB)
    print

which produces

columns 0 and 1
Frequency of ('A', 'A') = 0.5
Frequency of ('G', 'L') = 0.25
Frequency of ('L', 'G') = 0.25

columns 0 and 2
Frequency of ('A', 'S') = 0.5
Frequency of ('G', 'T') = 0.25
Frequency of ('L', 'T') = 0.25

[...]

What I would like to add to the code is the retrieval of the frequency within the column (i, j) for each letter from the pair of letters of columns i, j... I mean, an output similar to the following one:

columns 0 and 1
Frequency of ('A', 'A') = 0.5
  Freq of 'A' in column 0 = 0.5
  Freq of 'A' in column 1 = 0.5
Frequency of ('G', 'L') = 0.25
  Freq of 'G' in column 0 = 0.25
  Freq of 'L' in column 1 = 0.25
Frequency of ('L', 'G') = 0.25
  Freq of 'L' in column 0 = 0.25
  Freq of 'G' in column 1 = 0.25

columns 0 and 2
Frequency of ('A', 'S') = 0.5
  Freq of 'A' in column 0 = 0.5
  Freq of 'S' in column 2 = 0.5
Frequency of ('G', 'T') = 0.25
  Freq of 'G' in column 0 = 0.25
  Freq of 'T' in column 2 = 0.5
Frequency of ('L', 'T') = 0.25
  Freq of 'L' in column 0 = 0.25
  Freq of 'T' in column 2 = 0.5

[...]

Any help would be greatly appreciated


Solution

  • How about extending the same approach and doing it like this:

    for i, j in itertools.combinations(range(len(matrix.T)), 2):
        c = matrix[:, [i,j]]
        combined_counts = collections.Counter(map(tuple,c))
        first_column_counts = collections.Counter(c[:,0])
        second_column_counts = collections.Counter(c[:,1])
        print 'columns {} and {}'.format(i,j)
        for AB in combined_counts:
          freq_AB = float(float(combined_counts[AB])/len(c))
          print 'Frequency of {} = {}'.format(AB, freq_AB)
          freq_A = float(first_column_counts[AB[0]])/len(c)
          print "  Freq of '{}' in column {} = {}".format(AB[0], i, freq_A)
          freq_B = float(second_column_counts[AB[1]])/len(c)
          print "  Freq of '{}' in column {} = {}".format(AB[1], i, freq_B)
        print