Search code examples
pythondictionaryautovivification

How to create nested dictionaries with duplicate keys in python


I want to create data structure with nested dictionaries and duplicate keys. A detailed example is:

data['State1']['Landon']['abc Area'] = 'BOB'
data['State1']['Landon']['abc Area'] = 'SAM'
data['State1']['Landon']['xyz Area'] = 'John'
data['State2']['New York']['hjk Area'] = 'Ricky'

for z in data['State1'].keys() , 
# I should get list ['Landon', 'Landon', 'Landon']
for y in data['State1']['Landon'].keys() , 
# I should get list ['abc Area', 'abc Area', 'xyz Area']

Currently to store the data I have used extra counter key

data = Autovivification()  
data[state][city][area][counter] = ID  

But while parsing total entries (duplicates as well) of City/Area, I have to use nested loops till counter key.

for city in data['State1'].keys():
  for area in data['State1'][city].keys():
    for counter in data['State1'][city][area].keys():
     for temp in data['State1'][city][area][counter].values():
         cityList.append(city)
         areaList.append(area)

For nested dictionaries, I found the following code posted by nosklo

class AutoVivification(dict):  
    """Implementation of perl's autovivification feature."""  
    def __getitem__(self, item):                
         try:  
            return dict.__getitem__(self, item)  
        except KeyError:   
            value = self[item] = type(self)()  
            return value

and for dictionary with duplicate keys, I found code posted by Scorpil

class Dictlist(dict):  
    def __setitem__(self, key, value):  
        try:   
            self[key]   
        except KeyError:   
            super(Dictlist, self).__setitem__(key, [])   
        self[key].append(value)  

how to merge Autovivification and Duplicate class code? or is there any other pythonic way to handle such scenario?


Solution

  • One more example using defaultdict:

    from collections import defaultdict
    
    
    data = defaultdict(  # State
        lambda: defaultdict(  # City
            lambda: defaultdict(list)  # Area
        )
    )
    
    
    data['State']['City']['Area'].append('area 1')
    data['State']['City']['Area'].append('area 2')
    data['State']['City']['Area'].append('area 2')
    
    
    areas = data['State']['City']['Area']
    print(areas)  # ['area 1', 'area 2', 'area 2']
    
    total = len(areas)
    print(total)  # 3
    

    How to get list of items you want, using this solution:

    data['State1']['Landon']['abc Area'].append('BOB')
    data['State1']['Landon']['abc Area'].append('SAM')
    data['State1']['Landon']['xyz Area'].append('John')
    data['State2']['New York']['hjk Area'].append('Ricky')
    
    
    def items_in(d):
        res = []
        if isinstance(d, list):
            res.extend(d)
        elif isinstance(d, dict):
            for k, v in d.items():
                res.extend([k] * len(items_in(v)))
        else:
            raise ValueError('Unknown data')
        return res
    
    
    print(items_in(data['State1']))  # ['Landon', 'Landon', 'Landon']
    print(items_in(data['State1']['Landon']))  # ['xyz Area', 'abc Area', 'abc Area']
    print(items_in(data['State1']['Landon']['abc Area']))  # ['BOB', 'SAM']
    print(items_in(data['State1']['Landon']['xyz Area']))  # ['John']
    
    print(items_in(data['State2']))  # ['New York']
    print(items_in(data['State2']['New York']))  # ['hjk Area']