Search code examples
pythonarraysnumpypython-itertools

How can I save all my produced permutations into a Numpy array?


I'd like to generate and save all the produced permutations I get with itertools.product() into an array. At the moment, I'm trying to get all permutations of length 2 of the characters 'ACGT'. When I try to use numpy.asarray(), only the final permutation (['T', 'T']) is saved, which I assume is because the entry is overridden each time in the array. I've tried the following,

import itertools as it
import numpy as np

for x in it.product('ACGT', repeat=2):
    array = np.asarray(x)

print(array)
['T', 'T']

I later want to do this for larger "word" lengths, but it's easier to test things out when I only expect 16 outcomes. If I was using R, I'd create an empty vector and sequentially add to the vector... However, I'm still trying to get the hang of Python... Please advise!


Solution

  • The reason this does not work is because for each result of it.product(..) you make an array for that result. Not for the entire result.

    You can create such matrix with:

    np.array(list(it.product('ACGT', repeat=2)))

    Or with meshgrid:

    dna = np.array(list('ACGT'))
    np.transpose(np.meshgrid(dna, dna)).reshape(-1,2)

    Both produce an array that looks like:

    array([['A', 'A'],
           ['A', 'C'],
           ['A', 'G'],
           ['A', 'T'],
           ['C', 'A'],
           ['C', 'C'],
           ['C', 'G'],
           ['C', 'T'],
           ['G', 'A'],
           ['G', 'C'],
           ['G', 'G'],
           ['G', 'T'],
           ['T', 'A'],
           ['T', 'C'],
           ['T', 'G'],
           ['T', 'T']], dtype='<U1')