Search code examples
pythondictionarygraphintervals

How can I turn a nested dictionary in to a graph?


I have a nested dictionary that looks like this:

{'Track_108': {'Track_3994': [(1, 6)],
               'Track_4118': [(8, 9)],
               'Track_4306': [(25, 26), (28, 30)]},
 'Track_112': {'Track_4007': [(19, 20)]},
 'Track_121': {'Track_4478': [(102, 104)]},
 'Track_130': {'Track_4068': [(132, 134)]},
 'Track_141': {'Track_5088': [(93, 95)],
               'Track_5195': [(103, 104), (106, 107)]}

The lists are intervals (durations) of a certain event. With the first number being the "start-frame" and the second number the "last-frame". So "Track_3994" has an event that has a duration of 6 frames.

I want to plot a histplot with on the x-axis the duration of events and the y axis a count. I need one histplot for the whole dictionary and preferebly also a histplot for each track that you see in the first column. Schematic example of desired graph

This would be a graph for the whole dictionary. The y-axis represent the count of how many times a duration is in the dictionary. For the data I provided there is only one event with a duration of 6, so that bar would have a height of 1. The bar of 2 on the x-axis will have a height of 5 on the y-axis, because there are 5 events with a duration of 2 frames.

For the graphs for each track the histogram will only show the duration distribution of that track. So these graphs will be a lot smaller. Eg. track_108 will have a graph that has a bar of height 2 for x=2, a bar of height 1 for x=3, and a bar of height 1 for x=6.


Solution

  • To solve the calculation and counting job, you can use something like this:

    from typing import Dict, List, Tuple # just typing hints for used/expected types in functions, could be left out
    
    def calculate_track_event_data(data_dict: Dict[str, List[Tuple[int, int]]]) -> Dict[int, int]:
        """
        Counts the durations afor a single track sub-dict (contains a dict of other tracks with a list of their durations as specified in question).
        Returns a dict with duration to count as key-value pairs.
        """
        hist_plot_data = {}
        for track, track_data in data_dict.items():
            for duration_info in track_data:
                duration = duration_info[1] - duration_info[0] + 1  # calculate duration
                try:
                    hist_plot_data[duration] += 1  # count up for calculated duration
                except KeyError:
                    hist_plot_data[duration] = 1  # add duration if not added yet
        return hist_plot_data
    
    
    def calculate_top_layer_event_data(data_dict: Dict[str,  Dict[str, List[Tuple[int, int]]]]) -> Dict[int, int]:
        """
        Counts the durations across the entire dict.
        Returns a dict with duration to count as key-value pairs.
        """
    
        hist_plot_data = {}
    
        for top_level_track, top_level_track_data in data_dict.items():
            hist_for_track = calculate_track_event_data(top_level_track_data)
            for duration, count in hist_for_track.items():
                try:
                    hist_plot_data[duration] += count  # sum up collected count for calculated duration
                except KeyError:
                    hist_plot_data[duration] = count  # add duration if not added yet
        return hist_plot_data
    

    For given dict it results in:

    # Data definition
    data = {'Track_108': {'Track_3994': [(1, 6)],
                          'Track_4118': [(8, 9)],
                          'Track_4306': [(25, 26), (28, 30)]},
            'Track_112': {'Track_4007': [(19, 20)]},
            'Track_121': {'Track_4478': [(102, 104)]},
            'Track_130': {'Track_4068': [(132, 134)]},
            'Track_141': {'Track_5088': [(93, 95)],
                          'Track_5195': [(103, 104), (106, 107)]}}
    
    # Call in code:
    print(calculate_track_event_data(data['Track_108']))
    print(calculate_top_layer_event_data(data))
    
    # Result on output:
    {6: 1, 2: 2, 3: 1}  <-- Result for Track 108
    {6: 1, 2: 5, 3: 4}  <-- Result for complete dictionary
    

    To visualize the results, you can use one of pythons libraries like mathplotlib (Have a look eg. in How to plot a histogram using Matplotlib in Python with a list of data? or https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html)