Search code examples
pythonpandasmultidimensional-arraymulti-index

Convert 2D dataframe to 3D based upon column names


I have a dataframe with this shape:

hpa02upc00            hpa01upc01            hpa01upc00            hpa00upc02            hpa00upc01            hpa00upc00
pwr                                                                                                                                    
 0                      58.28                 58.23                 58.95                 58.09                 58.84                   NaN
-1                        NaN                   NaN                 58.16                   NaN                 58.06                 58.85
-2                        NaN                   NaN                   NaN                   NaN                   NaN                 58.08

I want to turn it into a 3D data frame for filtering with the hpaXX value and the upcXX values as the new axis.

I am new to multiaxis and could use some help getting started on how to both extract the columns and regenerate the new dataframe.

Thanks!


Solution

  • IIUC, you can create a multiindex after extract-ing the patterned strings from your column names:

    df.columns = pd.MultiIndex.from_frame(df.columns.str.extract("(hpa.*)(upc.*)", expand=True), names=["hpa","upc"])
    
    >>> df["hpa00"]
    upc  upc02  upc01  upc00
    pwr                     
     0   58.09  58.84    NaN
    -1     NaN  58.06  58.85
     2     NaN    NaN  58.08
    
    >>> df["hpa00"]["upc02"]
    pwr
     0    58.09
    -1      NaN
     2      NaN
    Name: upc02, dtype: float64
    

    Input dataframe:

    df = pd.DataFrame({"pwr": [0,-1,2],
                       "hpa02upc00": [58.28, None, None],
                       "hpa01upc01": [58.23, None, None],
                       "hpa01upc00": [58.95,58.16,None],
                       "hpa00upc02": [58.09,None,None],
                       "hpa00upc01": [58.84,58.06,None],
                       "hpa00upc00": [None,58.85,58.08],
                       })
    
    df = df.set_index("pwr")