Is there a smart list/dictionary comprehension way of getting the intended output below give the following:
import numpy as np
freq_mat = np.random.randint(2,size=(4,5));
tokens = ['a', 'b', 'c', 'd', 'e'];
labels = ['X', 'S', 'Y', 'S'];
The intended output for freq_mat
array([[1, 0, 0, 1, 1],
[0, 0, 0, 0, 1],
[1, 0, 1, 1, 0],
[0, 1, 0, 0, 0]])
should like the following:
[({'a': True, 'b': False, 'c': False, 'd': True, 'e': True}, 'X'),
({'a': False, 'b': False, 'c': False, 'd': False, 'e': True}, 'S'),
({'a': True, 'b': False, 'c': True, 'd': True, 'e': False}, 'Y'),
({'a': False, 'b': True, 'c': False, 'd': False, 'e': False}, 'S')]
As you note in your updated post, your original code doesn't work quite right: it adds the same value for every key in a given row - all True
or all False
. The simplest correction to your original code would be this:
featureset = []
for row, label in zip(freq_mat, labels):
d = dict()
for key, val in zip(tokens, row): # The critical bit
d[key] = val>0
featureset.append((d,label))
A more streamlined version, but one that's still quite a bit more readable, I think, than the single-comprehension approach:
featureset = []
for row, label in zip(freq_mat, labels):
d = {key: val > 0 for key, val in zip(tokens, row)}
featureset.append((d, label))
Or for the one-liner:
featureset = [({key:val>0 for key, val in zip(tokens, row)}, label)
for row, label in zip(freq_mat, labels)]
Personally I'd probably go with the second approach, a compromise of concision and readability. But that's up to you, of course!