Search code examples
pythonlistdictionarydefaultdict

Merge duplicate values in a dictionary


  • I have a dictionary of weekdays: "0" = Monday, "1" = Thuesday...
  • The value of the weekdays is a list - these lists are time spans

I want to create a new dictionary where duplicates time spans are merged. With a dictonary key weekdays containing a list of similar weekdays and a key time_span containing the time spans.

Input:

{
   "0":[
      [
         "09:00:00",
         "12:00:00"
      ]
   ],
   "1":[
      [
         "09:00:00",
         "12:00:00"
      ]
   ],
   "2":[
      [
         "09:00:00",
         "12:00:00"
      ],
      [
         "12:30:00",
         "15:30:00"
      ]
   ],
   "3":[
      [
         "09:00:00",
         "12:00:00"
      ]
   ],
   "4":[
      [
         "09:00:00",
         "12:00:00"
      ]
   ],
   "5":[
      [
         "09:00:00",
         "12:00:00"
      ],
      [
         "12:30:00",
         "15:30:00"
      ]
   ],
   "6":[
      [
         "09:00:00",
         "12:00:00"
      ],
      [
         "12:30:00",
         "15:30:00"
      ]
   ]
}

Desired output:

[
   {
      "weekdays":[0, 1, 3, 4],
      "time_spans":[
         [
            "09:00:00",
            "12:00:00"
         ]
      ]
   },
   {
      "weekdays":[2, 5, 6],
      "time_spans":[
         [
            "09:00:00",
            "12:00:00"
         ],
         [
            "12:30:00",
            "15:30:00"
         ]
      ]
   }
]

If there is a better solution to this problem I am all ears.

The solutions that I found do not work, they assume that the value of the dict is not a list. The popular solution seems to be to flip key and values.

For example: Find dictionary keys with duplicate values

I guess I am too dull to see a obvious solution here...


Solution

  • Similar to the established solutions you found, you can store a representation of the lists (and lists of lists) as strings, which makes using them as dictionary keys straightforward.

    def timesheetMerge(timesheet):
        output = []
        unique_shifts = {}
        for key, val in timesheet.items():
            if str(val) not in unique_shifts.keys():
                unique_shifts[str(val)] = len(output)
                output.append({"weekdays": [int(key)], "time_spans": val})
            else:
                output[unique_shifts[str(val)]]["weekdays"].append(int(key))
    
        return output