Search code examples
pythonpython-2.7data-processing

How to transform some plain text output into JSON using some rules?


I'm parsing an output of a program that prints two words a line and these words can be duplicated. The output is sorted.

a  1
a  2
b  5
c  6
c  6
d  3
e  1
e  1
e  2
f  0

I want to create a dict that looks like this (using the input data I provided):

[
  {"name": "a", numbers: [{"number": "1", "duplicated": false},
                          {"number": "2", "duplicated": false}]},
  {"name": "b", numbers: [{"number": "5", "duplicated": false}],
  {"name": "c", numbers: [{"number": "6", "duplicated": true}],
  {"name": "d", numbers: [{"number": "3", "duplicated": false}],
  {"name": "e", numbers: [{"number": "1", "duplicated": true},
                          {"number": "2", "duplicated": false}]},
  {"name": "f", numbers: [{"number": "0", "duplicated": false}],
]

How can I achieve this? If possible, without using anything but standard library.

Everything I've tried looks big, monstrous and ugly.

There's no code as I could not get any result.


Solution

  • You could build a dict like {name_a: {value1: count1, value2:count2} ...}, then generate your output from it:

    from collections import defaultdict
    
    name_to_values = defaultdict(lambda: defaultdict(int))
    with open('data.txt') as f:
        for line in f:
            name, value = line.split()
            name_to_values[name][value] += 1
    
    out = []
    for name in name_to_values:
        d = {'name': name}
        d['numbers'] = [{'number': value, 'duplicated': count > 1 }
                        for value, count in name_to_values[name].items()]
        out.append(d)
    

    Output:

    print(out)
    
    # [{'name': 'a', 'numbers': [{'number': '1', 'duplicated': False}, {'number': '2', 'duplicated': False}]},
    #  {'name': 'b', 'numbers': [{'number': '5', 'duplicated': False}]}, 
    #  {'name': 'c', 'numbers': [{'number': '6', 'duplicated': True}]},
    #  {'name': 'd', 'numbers': [{'number': '3', 'duplicated': False}]},
    #  {'name': 'e', 'numbers': [{'number': '1', 'duplicated': True}, 
    #  {'number': '2', 'duplicated': False}]}, 
    #  {'name': 'f', 'numbers': [{'number': '0', 'duplicated': False}]}]