Search code examples
pythonpython-3.xdictionaryordereddictionary

Create a 3 level dictionary from a list of strings with multiple uniqe values for each key


I have a list of text strings from which I need to build a tree and as I understand the proper data structure to achieve this is a dictionary. The size of each string is fix and the format of all elements is the same so no additional checks are necessary. Each record of the list is a date in the format DD/MM/YYYY and the year / years should be on the root of the tree (the keys, no duplicates here), per each year may be multiple months (no duplicate months within same year) as value and per each month multiple days(no duplicate days within same month).

An example of the list of strings:

data = ['04/02/2018', '05/02/2018', '06/02/2018', '01/03/2018', '10/03/2018', '08/09/2017', '09/09/2017', '11/10/2017', '11/12/2017', '14/06/2018', '15/06/2018', '24/07/2018', '26/07/2018', '30/08/2018', '31/08/2018', '01/09/2018']

Beside a solution, if any could provide I would like also an explanation in order to understand.

This is what I wrote so far which is clearly wrong as the result is a dictionary with only last 2 items.

d = {}
for item in data:
    rec = item.split('/')
    d.update({rec[2]:{rec[1]:(rec[0])}})

The desired output for that data looks like this:

{'2017': {'09': ['08', '09'], '10': ['11'], '12': ['11']},
 '2018': {'02': ['04', '05', '06'],
          '03': ['01', '10'],
          '06': ['14', '15'],
          '07': ['24', '26'],
          '08': ['30', '31'],
          '09': ['01']}}

Solution

  • There are various way to achieve this. You could use a defaultdict from the collections module. But it can also be done use the plain dict.setdefault method.

    setdefault(key[, default])

    If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

    We loop over the data, splitting it into day, month, and year strings. Then we look in the base tree for the year key, and if it doesn't exist we create a new empty dict for it. Then we look in that year dict for a month key, creating a new list for it if it doesn't exist. Finally we append the day string to the month list.

    from pprint import pprint
    
    data = [
        '04/02/2018', '05/02/2018', '06/02/2018', '01/03/2018', '10/03/2018', '08/09/2017', '09/09/2017',
        '11/10/2017', '11/12/2017', '14/06/2018', '15/06/2018', '24/07/2018', '26/07/2018', '30/08/2018',
        '31/08/2018', '01/09/2018'
    ]
    
    tree = {}
    
    for s in data:
        day, mon, year = s.split('/')
        ydict = tree.setdefault(year, {})
        mlist = ydict.setdefault(mon, [])
        mlist.append(day)
    
    pprint(tree)
    

    output

    {'2017': {'09': ['08', '09'], '10': ['11'], '12': ['11']},
     '2018': {'02': ['04', '05', '06'],
              '03': ['01', '10'],
              '06': ['14', '15'],
              '07': ['24', '26'],
              '08': ['30', '31'],
              '09': ['01']}}
    

    We can combine the 3 steps of the main loop into one line, but it's a bit harder to read:

    for s in data:
        day, mon, year = s.split('/')
        tree.setdefault(year, {}).setdefault(mon, []).append(day)