python python-3.x list-comprehension nested-loops

python 3 nested comprehension

Is there a smart list/dictionary comprehension way of getting the intended output below give the following:

import numpy as np
freq_mat = np.random.randint(2,size=(4,5));
tokens = ['a', 'b', 'c', 'd', 'e'];
labels = ['X', 'S', 'Y', 'S'];

The intended output for freq_mat

array([[1, 0, 0, 1, 1],
       [0, 0, 0, 0, 1],
       [1, 0, 1, 1, 0],
       [0, 1, 0, 0, 0]])

should like the following:

[({'a': True, 'b': False, 'c': False, 'd': True, 'e': True}, 'X'),
 ({'a': False, 'b': False, 'c': False, 'd': False, 'e': True}, 'S'),
 ({'a': True, 'b': False, 'c': True, 'd': True, 'e': False}, 'Y'),
 ({'a': False, 'b': True, 'c': False, 'd': False, 'e': False}, 'S')]

Solution

As you note in your updated post, your original code doesn't work quite right: it adds the same value for every key in a given row - all True or all False. The simplest correction to your original code would be this:

featureset = []
for row, label in zip(freq_mat, labels):
    d = dict()
    for key, val in zip(tokens, row): # The critical bit
        d[key] = val>0            
    featureset.append((d,label))

A more streamlined version, but one that's still quite a bit more readable, I think, than the single-comprehension approach:

featureset = []
for row, label in zip(freq_mat, labels):
    d = {key: val > 0 for key, val in zip(tokens, row)}
    featureset.append((d, label))

Or for the one-liner:

featureset = [({key:val>0 for key, val in zip(tokens, row)}, label)
    for row, label in zip(freq_mat, labels)]

Personally I'd probably go with the second approach, a compromise of concision and readability. But that's up to you, of course!