Search code examples
pythonpandasgspread

MultiIndex pandas dataframe and writing to Google Sheets using gspread-pandas


Starting with the following dictionary:

test_dict = {'header1_1': {'header2_1': {'header3_1': {'header4_1': ['322.5', 330.0, -0.28],
    'header4_2': ['322.5', 332.5, -0.26]},
   'header3_2': {'header4_1': ['285.0', 277.5, -0.09],
    'header4_2': ['287.5', 277.5, -0.12]}},
  'header2_2': {'header3_1': {'header4_1': ['345.0', 357.5, -0.14],
    'header4_2': ['345.0', 362.5, -0.14]},
   'header3_2': {'header4_1': ['257.5', 245.0, -0.1],
    'header4_2': ['257.5', 240.0, -0.08]}}}}

I want the headers in the index, so I reform the dictionary:

reformed_dict = {}
for outerKey, innerDict in test_dict.items():
    for innerKey, innerDict2 in innerDict.items():
        for innerKey2, innerDict3 in innerDict2.items():
            for innerKey3, values in innerDict3.items():
                reformed_dict[(outerKey,
                        innerKey, innerKey2, innerKey3)] = values

And assign column names to the headers:

keys = reformed_dict.keys()
values = reformed_dict.values()
index = pd.MultiIndex.from_tuples(keys, names=["H1", "H2", "H3", "H4"])
df = pd.DataFrame(data=values, index=index)

That gets to a dataframe that looks like this: enter image description here

Issue #1 [*** this has been answered by @AzharKhan, so feel free to skip ahead to Issue #2 ***]: To assign names to the data columns, I tried:

df.columns = ['col 1', 'col 2' 'col 3']

and got error: "ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements"

Then per a suggestion, I tried:

df = df.rename(columns={'0': 'Col1', '1': 'Col2', '2': 'Col3'})

This does not generate an error, but the dataframe looks exactly the same as before, with 0, 1, 2 as the data column headers.

How can I assign names to these data columns? I assume 0, 1, 2 are column indices, not column names.

Issue #2: When I write this dataframe to Google Sheets using gspread-pandas:

s.open_sheet('test')
Spread.df_to_sheet(s, df, index=True, headers=True, start='A8', replace=False) 

The result is this: enter image description here

What I would like is this: enter image description here

This is how the dataframe appears in Jupyter notebook screenshot earlier, so it seems the process of writing to spreadsheet is filling in the empty row headers, which makes the table harder to read at a glance.

How can I get the output to spreadsheet to omit the row headers until they have changed, and thus get the second spreadsheet output?


Solution

  • The creator of gspread-pandas has added the functionality to merge indexes when writing a dataframe to Google Sheets. It's not yet in general release version of gspread-pandas, but can be found here: https://github.com/aiguofer/gspread-pandas/pull/92