Search code examples
pythonjsondictionarydata-structuresnested

Generate a dictionary with two-level keys from list


Struggling to get the desired data structure. (Note - pandas implementation are preferred)

Currently I have the following lists of dictionaries:

list1 =[
{'ip': '11.22.33.44', 'timestamp': 1665480231699, 'message': '{"body": "Idle time larger than time period. retry:0"}', 'ingestionTime': 1665480263198},
{'ip': '11.22.33.42', 'timestamp': 1665480231698, 'message': '{"body": "Idle time larger than time period. retry:5"}', 'ingestionTime': 1665480263198}, 
{'ip': '11.22.33.44', 'timestamp': 1665480231698, 'message': '{"body": "Idle time larger than time period. retry:0"}', 'ingestionTime': 1665480263198}
]
whitelist_metadata = [
  {
    'LogLevel': 'WARNING',
    'SpecificVersion': 'None',
    'TimeInterval(Min)': 1,
    'MetricMsg': 'DDR: XXXX count got lost',
    'AllowedOccurrenceInTimeInterval': 0   --> this means that we are allowing this msg always 
  },
  {
    'LogLevel': 'WARNING',
    'SpecificVersion': 'None',
    'TimeInterval(Min)': 1,
    'MetricMsg': 'Idle time larger than XXX time. retry: \\d ',
    'AllowedOccurrenceInTimeInterval': 5  --> this means that are allowing this msg only if happened not more than 5 times within 1min.
  }
]

And my desired output is

{
  '11.22.33.42': {
    1665480231698: ['{"body": "Idle time larger than time period. retry:5"}']
  },
  '11.22.33.44': {
    1665480231698: ['{"body": "Idle time larger than time period. retry:0"}'],
    1665480231699: ['{"body": "Idle time larger than time period. retry:0"}']
  }
}

How do I achieve that?


Attempts: Tried to play with pandas pivot to convert the data structure, but failed - this is what i tried:

df = pd.DataFrame(list1)
s = df.pivot(['ip', 'timestamp'], 'message')   
ss = s.assign(r=s.to_dict('records'))['r'].unstack(0).to_dict() 

Here i already have issue with hows data looks like (the message part - i need it to be the timestamp value and not another key that appear as tupple)

>> print(S) 
                            ingestionTime                                                                                                  
message                     {"body": "Idle time larger than time period. retry:0"} {"body": "Idle time larger than time period. retry:5"}
ip timestamp                                                                                                                    
11.22.33.42   1665480231698           NaN                                            1.665480e+12                                          
11.22.33.44   1665480231698  1.665480e+12                                                     NaN                                          
              1665480231699  1.665480e+12                                                     NaN                                          
>> print(ss)
{
  '11.22.33.42': {
    1665480231698: {
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:0"}'
      ): nan,
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:5"}'
      ): 1665480263198.0
    },
    1665480231699: nan
  },
  '11.22.33.44': {
    1665480231698: {
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:0"}'
      ): 1665480263198.0,
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:5"}'
      ): nan
    },
    1665480231699: {
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:0"}'
      ): 1665480263198.0,
      (
      'ingestionTime',
      '{"body": "Idle time larger than time period. retry:5"}'
      ): nan
    }
  }
}

Solution

  • As the desired output is

    {
      '11.22.33.42': {
        1665480231698: ['{"body": "Idle time larger than time period. retry:5"}']
      },
      '11.22.33.44': {
        1665480231698: ['{"body": "Idle time larger than time period. retry:0"}'],
        1665480231699: ['{"body": "Idle time larger than time period. retry:0"}']
      }
    }
    

    Considering the data that OP shared in the question, one doesn't actually need the second list. The list1 would be enough.

    The following function will do the work (the comments make it self-explanatory)

    def todict(list1):
    
        dict1 = {} # create an empty dictionary
    
        for item in list1: # iterate over the list
    
            if item['ip'] not in dict1: # if the ip is not in the dictionary
                dict1[item['ip']] = {} # create a new key with the ip as value
    
            if item['timestamp'] not in dict1[item['ip']]: # if the timestamp is not in the dictionary
                dict1[item['ip']][item['timestamp']] = [] # create a new key with the timestamp as value
    
            dict1[item['ip']][item['timestamp']].append(item['message']) # append the message to the list
    
        return dict1
    

    Then one gets the following

    dict = todict(list1)
    
    [Out]:
    
    {'11.22.33.42': {1665480231698: ['{"body": "Idle time larger than time period. '
                                     'retry:5"}']},
     '11.22.33.44': {1665480231698: ['{"body": "Idle time larger than time period. '
                                     'retry:0"}'],
                     1665480231699: ['{"body": "Idle time larger than time period. '
                                     'retry:0"}']}}