Search code examples
pythonexcelpandasdataframemulti-index

How to prevent repeating data for additional indexes in a pandas dataframe


I wanted to see how I can add new data when I include a new index into the excel sheet. This is what I mean:

Picture 1 (What my code currently outputs):

enter image description here

Picture 2 (What I want it to output):

enter image description here

I want to add [1, 2] to row 1, data type 2, and [3, 4] to row 2, data type 2. Here's my code for picture 1:

import pandas as pd

# Create a Pandas dataframe from the data.
df1 = pd.DataFrame([['a', 'b'], ['c', 'd']],
                    index=['row 1', 'row 2'],
                    columns=['col 1', 'col 2'])

# Create a Pandas Excel writer using XlsxWriter as the engine.
writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')

# Add extra index
new_idx = pd.Index(['data type 1', 'data type 2'])
mi1 = pd.MultiIndex.from_product([df1.index, new_idx])
out1 = df1.reindex(df1.index.repeat(len(new_idx))).set_index(mi1)

# Convert the dataframe to an XlsxWriter Excel object.
out1.to_excel(writer, sheet_name='Sheet1')

# Close the Pandas Excel writer and output the Excel file.
writer.close()

Is there a possible way to do this?


Solution

  • slice(None) can be used to select all the rows at that level. Since you want to change values where the index is 'data type 2' in level=1, try the following:

    # write new values to data type 2 indices
    out1.loc[(slice(None), 'data type 2'), :] = [[1, 2], [3, 4]]
    

    So the full code would look like:

    # Create a Pandas dataframe from the data.
    df1 = pd.DataFrame([['a', 'b'], ['c', 'd']], index=['row 1', 'row 2'], columns=['col 1', 'col 2'])
    
    # # Create a Pandas Excel writer using XlsxWriter as the engine.
    writer = pd.ExcelWriter('pandas_simple.xlsx', engine='xlsxwriter')
    
    # Add extra index
    new_idx = pd.Index(['data type 1', 'data type 2'])
    mi1 = pd.MultiIndex.from_product([df1.index, new_idx])
    out1 = df1.reindex(df1.index.repeat(len(new_idx))).set_index(mi1)
    
    # write new values to data type 2 indices
    out1.loc[(slice(None), 'data type 2'), :] = [[1, 2], [3, 4]]    # <-- add this line
    
    # # Convert the dataframe to an XlsxWriter Excel object.
    out1.to_excel(writer, sheet_name='Sheet1')
    
    writer.close()
    

    Then pd.read_excel('pandas_simple.xlsx', index_col=[0,1]) produces the following frame:

    result

    Another way is to change the way you contruct out1; instead of reindexing df1, construct another dataframe using the same "metadata" as df1 and concatenate the two:

    out1 = (
        pd.concat({
            'data type 1': df1, 
            'data type 2': pd.DataFrame([[1, 2], [3, 4]], columns=df1.columns, index=df1.index)
        }).swaplevel().sort_index()
    )