I have multiple lists of features which are strings that I want to analyze. That is, e.g.:
[["0.5", "0.4", "disabled", "0.7", "disabled"], ["feature1", "feature2", "feature4", "feature1", "feature3"]]
I know how to convert strings like "0.5" to floats, but is there a way to "normalize" such lists to integer or float values (each list independently in my case)? I would like to get something like this:
[[2, 1, 0, 3, 0], [0, 1, 3, 0, 2]]
Does anyone know how to achieve this? Unfortunately I couldn't to find anything related to this problem yet.
Use a dictionary and a counter to give IDs to new values and remember past IDs:
import itertools, collections
def norm(lst):
d = collections.defaultdict(itertools.count().__next__)
return [d[s] for s in lst]
lst = [["0.5", "0.4", "disabled", "0.7", "disabled"],
["feature1", "feature2", "feature4", "feature1", "feature3"]]
print(list(map(norm, lst)))
# [[0, 1, 2, 3, 2], [0, 1, 2, 0, 3]]
Or by enumerating sorted unique values; note, however, that "disables"
sorts after the numeric values:
def norm_sort(lst):
d = {x: i for i, x in enumerate(sorted(set(lst)))}
return [d[s] for s in lst]
print(list(map(norm_sort, lst)))
[[1, 0, 3, 2, 3], [0, 1, 3, 0, 2]]