I have a nested list that I need to chain, then run metrics, then "unchain" back into its original nested format. Here is example data to illustrate:
from itertools import chain
nested_list = [['x', 'xx', 'xxx'], ['yy', 'yyy', 'y', 'yyyy'], ['zz', 'z']]
chained_list = list(chain(*nested_list))
print("chained_list: \n", chained_list)
metrics_list = [str(chained_list[x]) +'_score' \
for x in range(len(chained_list))]
print("metrics_list: \n", metrics_list)
zipped_scores = list(zip(chained_list, metrics_list))
print("zipped_scores: \n", zipped_scores)
unchain_function = '????'
chained_list:
['x', 'xx', 'xxx', 'yy', 'yyy', 'y', 'yyyy', 'zz', 'z']
metrics_list:
['x_score', 'xx_score', 'xxx_score', 'yy_score', 'yyy_score', 'y_score', 'yyyy_score', 'zz_score', 'z_score']
zipped_scores:
[('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score'), ('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score'), ('zz', 'zz_score'), ('z', 'z_score')]
Is there a python function or pythonic way to write an "unchain_function" to get this DESIRED OUTPUT?
[
[
('x', 'x_score'),
('xx', 'xx_score'),
('xxx', 'xxx_score')
],
[
('yy', 'yy_score'),
('yyy', 'yyy_score'),
('y', 'y_score'),
('yyyy', 'yyyy_score')
],
[
('zz', 'zz_score'),
('z', 'z_score')
]
]
(background: this is for running metrics on lists having lengths greater than 100,000)
I dunno about how pythonic this is, but this should work. Long story short, we're using a Wrapper
class to turn an immutable primitive (which is impossible to change without replacing) into a mutable variable (so we can have multiple references to the same variable, each organized differently).
We create an identical nested list except that each value is a Wrapper
of the corresponding value from the original list. Then, we apply the same transformation to unchain the wrapper list. Copy changes from the processed chained list onto the chained wrapper list, and then access those changes from the nested wrapper list and unwrap them.
I think that using an explicit and simple class called Wrapper
is easier to understand, but you could do essentially the same thing by using a singleton list to contain the variable instead of an instance of Wrapper
.
from itertools import chain
nested_list = [['x', 'xx', 'xxx'], ['yy', 'yyy', 'y', 'yyyy'], ['zz', 'z']]
chained_list = list(chain(*nested_list))
metrics_list = [str(chained_list[x]) +'_score' for x in range(len(chained_list))]
zipped_scores = list(zip(chained_list, metrics_list))
# create a simple Wrapper class, so we can essentially have a mutable primitive.
# We can put the Wrapper into two different lists, and modify its value without
# overwriting it.
class Wrapper:
def __init__(self, value):
self.value = value
# create a 'duplicate list' of the nested and chained lists, respectively,
# such that each element of these lists is a Wrapper of the corresponding
# element in the above lists
nested_wrappers = [[Wrapper(elem) for elem in sublist] for sublist in nested_list]
chained_wrappers = list(chain(*nested_wrappers))
# now we have two references to the same MUTABLE Wrapper for each element of
# the original lists - one nested, and one chained. If we change a property
# of the chained Wrapper, the change will reflect on the corresponding nested
# Wrapper. Copy the changes from the zipped scores onto the chained wrappers
for score, wrapper in zip(zipped_scores, chained_wrappers):
wrapper.value = score
# then extract the values in the unchained list of the same wrappers, thus
# preserving both the changes and the original nested organization
unchained_list = [[wrapper.value for wrapper in sublist] for sublist in nested_wrappers]
This ends with unchained_list
equal to the following:
[[('x', 'x_score'), ('xx', 'xx_score'), ('xxx', 'xxx_score')], [('yy', 'yy_score'), ('yyy', 'yyy_score'), ('y', 'y_score'), ('yyyy', 'yyyy_score')], [('zz', 'zz_score'), ('z', 'z_score')]]