Search code examples
pythonjsoncombinationspython-itertools

Search for combinations in JSON nested object


I have a large JSON object. A piece of it is:

data = [
{  
   'make': 'dacia',
   'model': 'x',
   'version': 'A',
   'typ': 'sedan',
   'infos': [
            {'id': 1, 'name': 'steering wheel problems'}, 
            {'id': 32, 'name': ABS errors}
   ]
},
{  
   'make': 'nissan',
   'model': 'z',
   'version': 'B',
   'typ': 'coupe',
   'infos': [
         {'id': 3,'name': throttle problems'}, 
         {'id': 56, 'name': 'broken handbreak'}, 
         {'id': 11, ;'name': missing seatbelts'}
   ]
}
]

I created a list of all possible combinations of infos that might occur in my JSON (one car sometimes can have only one info and another one can have plenty of it):

inf = list(set(i.get'name' for d in data for i in (d['infos'] if isinstance(d['infos'], list) else [d['infos']]))
inf_comb = [combo for n in range(1, len(infos+1)) for combo in itertools.combinations(infos, n)]
infos_combo = [list(elem) for elem in inf_comb]

Now I need to iterate over whole JSON data and count how many times certain collection of infos_combo occurs, so I created code:

tab = []
s = 0
for x in infos_combo:
   s = sum([1 for k in data if (([i['name'] for i in (k['infos'] if isinstance(k['infos'], list) else [k['infos']])] == x))])
   if s!= 0:
     tab.append({'infos': r, 'sum': s})
print(tab)

The problem I'm facing is that tab returns only some of elements I expect - there are much more combinations that occurs in my JSON object and has to be counted but I can't get them. How can this be solved?


Solution

  • Okay, so first you need to get all of the actual "infos" from your json data like so:

    infos = [
        [i["name"] for i in d["infos"]] if isinstance(d["infos"], list) else d["infos"]
        for d in data
    ]
    

    This will give you something like below which we will use later:

    [['steering wheel problems', 'ABS errors'], ['throttle problems', 'broken handbreak', 'missing seatbelts']]
    

    Now, to get all of the combinations, we first need to process this by flattening the infos array and weeding out duplicates:

    unique_infos = [x for l in infos for x in l]
    

    To get all of the combinations:

    infos_combo = itertools.chain.from_iterable(
        itertools.combinations(unique_infos, r) for r in range(len(unique_infos) + 1)
    )
    

    which will yield:

    ()
    ('steering wheel problems',)
    ('ABS errors',)
    ('throttle problems',)
    ('broken handbreak',)
    ('missing seatbelts',)
    ('steering wheel problems', 'ABS errors')
    ('steering wheel problems', 'throttle problems')
    ('steering wheel problems', 'broken handbreak')
    ...
    # truncated code too long
    ...
    ('steering wheel problems', 'throttle problems', 'broken handbreak', 'missing seatbelts')
    ('ABS errors', 'throttle problems', 'broken handbreak', 'missing seatbelts')
    ('steering wheel problems', 'ABS errors', 'throttle problems', 'broken handbreak', 'missing seatbelts')
    
    

    After that, it's a matter of doing a count for every combination we have from the original infos list:

    occurences = {}
    for combo in infos_combo:
        occurences[combo] = infos.count(list(combo))
    
    print(occurences)
    

    The full code:

    import itertools
    import sys
    
    data = [
        {
            "make": "dacia",
            "model": "x",
            "version": "A",
            "typ": "sedan",
            "infos": [
                {"id": 1, "name": "steering wheel problems"},
                {"id": 32, "name": "ABS errors"},
            ],
        },
        {
            "make": "nissan",
            "model": "z",
            "version": "B",
            "typ": "coupe",
            "infos": [
                {"id": 3, "name": "throttle problems"},
                {"id": 56, "name": "broken handbreak"},
                {"id": 11, "name": "missing seatbelts"},
            ],
        },
    ]
    
    infos = [
        [i["name"] for i in d["infos"]] if isinstance(d["infos"], list) else d["infos"]
        for d in data
    ]
    
    unique_infos = [x for l in infos for x in l]
    
    infos_combo = itertools.chain.from_iterable(
        itertools.combinations(unique_infos, r) for r in range(len(unique_infos) + 1)
    )
    
    occurences = {}
    for combo in infos_combo:
        occurences[combo] = infos.count(list(combo))
    
    print(occurences)