I have a big dictionary (250k+ keys) like this:
dict = {
0: [apple, green],
1: [banana, yellow],
2: [apple, red],
3: [apple, brown],
4: [kiwi, green],
5: [kiwi, brown],
...
}
1. I want a new dictionary with the first value of the list as key, and a list of values for the same key. Something like this:
new_dict = {
apple: [green, red, brown]
banana: [yellow]
kiwi: [green, brown],
...
}
2. After that I want to count the number of values for each key (e.g. {apple:3, banana:1, kiwi,2}
), and this could be easily achieved with a Counter
, so it shouldn't be a problem.
Then, I want to select only the keys that have a certain number of values (for example, if I want to mantain only keys associated to 2 or more values, the final_dict will be this:
final_dict = {
apple:3,
kiwi:2,
....
}
3. Then I want to return the original keys from dict
of the elements that have at least 2 values, so at the end I will have:
original_keys_with_at_least_2_values = [0, 2, 3, 4, 5]
# Create new_dict like: new_dict = {apple:None, banana:None, kiwi:None,..}
new_dict = {k: None for k in dict.values()[0]}
for k in new_dict.keys():
for i in dict.values()[0]:
if i == k:
new_dict[k] = dict[i][1]
I'm stuck using nested for
cicles, even if I know Python comprehension is faster, but I really don't know how to solve it. Any solution or idea would be appreciated.
You can use a defaultdict
to group the items by the first entry
from collections import defaultdict
fruits = defaultdict(list)
data = {
0: ['apple', 'green'],
1: ['banana', 'yellow'],
2: ['apple', 'red'],
3: ['apple', 'brown'],
4: ['kiwi', 'green'],
5: ['kiwi', 'brown']
}
for _, v in data.items():
fruits[v[0]].extend(v[1:])
print(dict(fruits))
# {'apple': ['green', 'red', 'brown'], 'banana': ['yellow'], 'kiwi': ['green', 'brown']}
If there is less than two entries in any list, you'll need to account for that...
Then, use comprehension to get the counts, not Counter
as that won't give you the lengths of those lists.
fruits_count = {k: len(v) for k, v in fruits.items()}
fruits_count_with_at_least_2 = {k: v for k, v in fruits_count.items() if v >= 2}
And then a loop will be needed to collect the original keys
original_keys_with_2_count = []
for k, values in data.items():
fruit = values[0]
count = fruits_count.get(fruit, -1)
if count >= 2:
original_keys_with_2_count.append(k)
print(original_keys_with_2_count)
# [0, 2, 3, 4, 5]