Search code examples
pythonpandasmulti-index

Creating a multi-header table from JSON


I'm new to pandas. I'm trying to create the following table:

        |          C         |          Perl      |
        |   line   |   func  |    line   |   func |
version |  covered |  total  |  covered  |  total |

There is the following structure of JSON that I have created:

{
   "version1":{
      "perl":{
         "line_covered":207,
         "line_total":312,
         "func_covered":15,
         "func_total":18
      },
      "C":{
         "line_covered":321,
         "line_total":512,
         "func_covered":10,
         "func_total":10
      }
   }
}

I want to iterate over this JSON and create the table. My first problem is that I can't seem to understand how should I create the header of the table. From previous topics, I found that I should use MultiIndex but for some reason, any combination I tried, does not create the wanted header. Is it possible to show me how to create this table?


Solution

  • My approach

    import numpy as np
    from collections import defaultdict
    d = defaultdict(list)
    

    for k,v in my_dict.items():
        d['index'].append(k)
        for k1,v1 in v.items():
            for k2,v2 in v1.items():
                d['columns'].append((k1,k2))
                d['data'].append(v2)
    d = dict(d)
    d['data'] = np.array(d['data']).reshape(1, len(d['data']))
    d['columns'] = pd.MultiIndex.from_tuples(columns)
    

    Build DataFrame

    pd.DataFrame(**d)
    

    Output

                     perl                                               C                       
             line_covered line_total func_covered func_total func_covered           func_total   
    version1          207        312           15         18           10  version1         10   
    

    use defaultdict but could just start three lists before starting the loop

    EDIT

    for expected output use

    import numpy as np
    from collections import defaultdict
    d = defaultdict(list)
    for k,v in my_dict.items():
        d['index'].append(k)
        for k1,v1 in v.items():
            for k2,v2 in v1.items():
                split = k2.split('_')
                d['columns'].append((k1, split[0]))
                d['data'].append(split[1])
    d = dict(d)
    d['data'] = np.array(d['data']).reshape(1,len(d['data']))
    d['columns'] = pd.MultiIndex.from_tuples(d['columns']).copy()
    pd.DataFrame(**d)
    

    Output

                 perl                               C       
                 line   line     func   func     func   func
    version1  covered  total  covered  total  covered  total
    

    Details

    print(d)
    #{'index': ['version1'], 'columns': MultiIndex([('perl', 'line_covered'),
    #            ('perl',   'line_total'),
    #            ('perl', 'func_covered'),
    #            ('perl',   'func_total'),
    #            (   'C', 'func_covered'),
    #            (   'C',   'func_total')],
    #           ), 'data': array([[207, 312,  15,  18,  10,  10]])}
    

    you can see what** do