Search code examples
pandasindexingpivotmelt

Rearranging data for multi line headers in pandas


I am working with the following data from FWI in pandas

enter image description here I want to transform it to following form for each country enter image description here

I have tried using the melt, stack, and pivot function but have not been able to achieve desired result with these multi line headers. Link for the dataset: https://freedomhouse.org/sites/default/files/2021-02/Country_and_Territory_Ratings_and_Statuses_FIW1973-2021.xlsx Can anyone help please?


Solution

  • You need to read the column headers as a MultiIndex, then unstack the columns (unstack works on the row index, so you have to transpose the dataframe).

    df = pd.read_excel('Country_and_Territory_Ratings_and_Statuses_FIW1973-2021.xlsx', 'Territory Ratings, Statuses',skiprows=1, header=[0,1], index_col=0)
    

    The issue with this particular Excel table is that there is a trailing space after some 'CL' header names (i.e. they are 'CL' and 'CL '). You can fix this by re-creating the index like so:

    df.columns = pd.MultiIndex.from_arrays([df.columns.get_level_values(0),
                                            df.columns.get_level_values(1).str.strip()])
    

    Then unstack:

    result = df.T.unstack(0).T