I'm parsing an output of a program that prints two words a line and these words can be duplicated. The output is sorted.
a 1
a 2
b 5
c 6
c 6
d 3
e 1
e 1
e 2
f 0
I want to create a dict that looks like this (using the input data I provided):
[
{"name": "a", numbers: [{"number": "1", "duplicated": false},
{"number": "2", "duplicated": false}]},
{"name": "b", numbers: [{"number": "5", "duplicated": false}],
{"name": "c", numbers: [{"number": "6", "duplicated": true}],
{"name": "d", numbers: [{"number": "3", "duplicated": false}],
{"name": "e", numbers: [{"number": "1", "duplicated": true},
{"number": "2", "duplicated": false}]},
{"name": "f", numbers: [{"number": "0", "duplicated": false}],
]
How can I achieve this? If possible, without using anything but standard library.
Everything I've tried looks big, monstrous and ugly.
There's no code as I could not get any result.
You could build a dict like {name_a: {value1: count1, value2:count2} ...}
, then generate your output from it:
from collections import defaultdict
name_to_values = defaultdict(lambda: defaultdict(int))
with open('data.txt') as f:
for line in f:
name, value = line.split()
name_to_values[name][value] += 1
out = []
for name in name_to_values:
d = {'name': name}
d['numbers'] = [{'number': value, 'duplicated': count > 1 }
for value, count in name_to_values[name].items()]
out.append(d)
Output:
print(out)
# [{'name': 'a', 'numbers': [{'number': '1', 'duplicated': False}, {'number': '2', 'duplicated': False}]},
# {'name': 'b', 'numbers': [{'number': '5', 'duplicated': False}]},
# {'name': 'c', 'numbers': [{'number': '6', 'duplicated': True}]},
# {'name': 'd', 'numbers': [{'number': '3', 'duplicated': False}]},
# {'name': 'e', 'numbers': [{'number': '1', 'duplicated': True},
# {'number': '2', 'duplicated': False}]},
# {'name': 'f', 'numbers': [{'number': '0', 'duplicated': False}]}]