Search code examples
python-3.xpandasdataframedictionaryxlrd

Exporting a nested dictionary with multiple values and variable keys to excel


A second attempt from here. What I need is to export the following dictionary to excel.

{1: {'Field Cluster': ['This', 'This', 'This'], 
     'Exploration Block': ['Is', 'Is', 'Is'], 
     'Producing since': [1923.0, 1923.0, 1923.0], 
     'Fluids': ['A ', 'A ', 'A '], 
     'Reservoirs': ['Test', 'Test', 'Test'], 
     'Area (km2)': ['File', 'File', 'File'], 
     'Depth (m)': ['A\nHuge\nDepth', 'A\nHuge\nDepth', 'A\nHuge\nDepth'],   
     'Concession License No.': ['UNIX license', 'UNIX license', 'UNIX license'], 
     'License Expiry Date / Extension': ['Everlasting', 'Everlasting', 'Everlasting'], 
     'Working Interest with SB': ['There is one\n', 'There is one\n', 'There is one\n'], 
     'Government approval:': ['It is!', 'It is!', 'It is!'], 
     'Last study:': ['Million years ago', 'Million years ago', 'Million years ago'], 
     'Parameters': ['Horizon1', 'Horizon2', 'Horizon3'], 
     'Reservoir rock': ['First', 'Second', 'Third'], 
     'Net pay thickness (m)': [1.0, 21.0, 41.0], 
     'Avr. porosity (%)': [2.0, 22.0, 42.0], 
     'Average absolute permeability  (mD)': [3.0, 23.0, 43.0], 
     'Swi (%)': [4.0, 24.0, 44.0], 
     'Initial pressure (at)': [5.0, 25.0, 45.0], 
     'Bubble Pressure (at.)': [6.0, 26.0, 46.0], 
     'Dew Point Pressure (at)': [7.0, 27.0, 47.0], 
     'Initial Solution Ratio (Stm3/m3)': [8.0, 28.0, 48.0], 
     'Initial Condensate Gas Ratio (g/Stm3)': [9.0, 29.0, 49.0], 
     'Oil density (kg/cm)': [10.0, 30.0, 50.0], 
     'Oil viscosity (Pb) (cP)': [11.0, 31.0, 51.0], 
     'Contaminants (H2S, CO2)': [12.0, 32.0, 52.0], 
     'Initial Oil in Place (e3 to)': [13.0, 33.0, 53.0], 
     'Initial NGL in Place (e3 to)': [14.0, 34.0, 54.0]}, 
 2: {'Field Cluster': ['This fff', 'This fff', 'This fff', 'This fff'],                 
     'Exploration Block': ['fff', 'fff', 'fff', 'fff'], 
     'Producing since': ['1923fff', '1923fff', '1923fff', '1923fff'],     
     'Fluids': ['A fff', 'A fff', 'A fff', 'A fff'],
     'Reservoirs': ['Test', 'Test', 'Test', 'Test'], 
     'Area (km2)': ['File', 'File', 'File', 'File'], 
     'Depth (m)': ['A\nHuge\nDepthfff', 'A\nHuge\nDepthfff', 'A\nHuge\nDepthfff', 'A\nHuge\nDepthfff'], 
     'Concession License No.': ['UNIX license', 'UNIX license', 'UNIX license', 'UNIX license'], 
     'License Expiry Date / Extension': ['Everlastingfff', 'Everlastingfff', 'Everlastingfff', 'Everlastingfff'], 
     'Working Interest': ['There is one\n', 'There is one\n', 'There is one\n', 'There is one\n'], 
     'Gouvernment approval:': ['ffff', 'ffff', 'ffff', 'ffff'], 
     'Last study:': ['Million years fffff', 'Million years fffff', 'Million years fffff', 'Million years fffff'], 
     'Parameters': ['Horizon1', 'Horizon2', 'Horizon3', 'Horizon4'],     
     'Reservoir rock': ['First', 'Second', 'Third', 'Fourth'], 
     'Net pay thickness (m)': [1.0, 21.0, 41.0, 61.0], 
     'Avr. porosity (%)': [2.0, 22.0, 42.0, 62.0], 
     'Average absolute permeability  (mD)': [3.0, 23.0, 43.0, 63.0], 
     'Swi (%)': [4.0, 24.0, 44.0, 64.0], 
     'Initial Oil in Place (e3 to)': [13.0, 33.0, 53.0, 73.0], 
     'Initial NGL in Place (e3 to)': [14.0, 34.0, 54.0, 74.0], 
     'Initial Gas (assoc.) in Place (e6 m3) sol.gas/gas cap': [15.0, 35.0, 55.0, 75.0], 
     'Initial Gas (non assoc.) in Place (e6 m3)': [16.0, 36.0, 56.0, 76.0],    
     'Primary recovery / drive mechanism\nNone': ['Wow\nA', 'Recovery\nNone', 'Mechanism\nNone', 'Nice\nNone', ''], 
     'Secondary recovery': ['Another one', '', '', '', ''], 
     'Total Wells': ['1000', '-', '-', '-', ''], 
     'Productive wells (oil/gas)': ['500', '-', '-', '-', ''], 
     'Injection wells (water/gas)': ['500', '-', '-', '-', ''], 
     'Rate of best producer in the field (tons / e3 Sm3/day)': ['30', '-', '-', '-', ''], 
     'WOW Production (Something)': ['1', 2.0, '3', '4', '']}}

There were two answers given in the previous post. The first one:

df=pd.DataFrame(d) # assuming d is the name of the dict
cols=df.columns
final=pd.concat([pd.DataFrame(df[i].dropna().tolist()) for i in cols],axis=1,keys=cols)
final.index=df.index
print(final)

This one works perfectly only for the first of the nested dictionaries. The key issue is that There are some keys in the second sub-dictionary missing, and the values are sorted according to the order used by first dictionary. This causes the values not to match the corresponding parameters.

Another answer is quite similar, and it works nice with the test dictionary, but not with the above one:

df=pd.DataFrame(d) # assuming d is the name of the dict
cols=df.columns
final=pd.concat([pd.DataFrame(v).T for k,v in d.items()],axis=1,sort=False,keys=d.keys())
final.index=df.index
print(final)

With the actual dictionary this code returns only two rows with the parameters in a tuple. Moreover, it only considers the second sub-dictionary.

What I want in a nutshell: Assume we have this small dictionary, quite similar to the actual one:

{1: 
    {'Parameter 1': ['Value 1', 'Value 2', 'Value 3'], 
     'Parameter 2': ['Value 11', 'Value 22', 'Value 33'], 
     'Parameter 3': ['Num1', 'Num2', 'Num3']},
 2:
    {'Parameter 1': ['Data 1', 'Data 2', 'Data 3'], 
     'Parameter 2': ['Data 11', 'Data 22', 'Data 33'], 
     'Parameter 4': ['Numb11', 'Numb22', 'Numb33']}
}

I want to get such table from it:

            |               1             |             2            |    
---------------------------------------------------------------------
Parameter 1 | Value 1 | Value 2 | Value 3 | Data 1 | Data 2 | Data 3 |
----------------------------------------------------------------------
Parameter 2 | Value 11| Value 22| Value 33| Data 1 | Data 2 | Data 3 |
----------------------------------------------------------------------
Parameter 3 |   Num1  |   Num2  |   Num3  |        |        |        |
----------------------------------------------------------------------
Parameter 4 |         |         |         | Numb11 | Numb22 | Numb33 | 
----------------------------------------------------------------------

So each value corresponds to its parameter, and all the parameters are located in the first column and are not duplicated.


Solution

  • The following do the same job than the one you give (but there are two less):

    df_to_concat = {k: pd.DataFrame(v).transpose() for (k, v) in d.items()}
    df = pd.concat(df_to_concat.values(), keys=df_to_concat.keys(), axis='columns')
    

    But your, big dictionary have unequal lists giving the the following error:

    ValueError: arrays must all be same length
    

    The last keys have a last empty values. When I delete by hand, the code works. If you want to do it programaticaly, you can do somthing like that before creating the dataframe, it deletes the last values of list with too many items:

    min_length = {k: min([len(one_list) for one_list in v.values()]) for (k, v) in d.items()}
    new_d = {}
    for k, v in d.items():
        new_v = {}
        for k2, one_list in v.items():
            new_v.update({k2: one_list[:min_length[k]]})
        new_d.update({k: new_v})