Search code examples
pythonpandasdataframemulti-index

how to make dataframe index from column name string? bonus question: then to get the spread dataframe


I have a dataframe with a 1300 row and 400 column looks like this

            AOC.2017Jan  AOC.2017Feb  ...  ZTP.2021Oct  ZTP.2021Nov
VALUE_TIME                            ...                          
2016-07-07          NaN          NaN  ...          NaN          NaN
                ...          ...  ...          ...          ...
2021-10-14          NaN          NaN  ...          NaN     101.1000
2021-10-15          NaN          NaN  ...          NaN      88.6250
2021-10-18          NaN          NaN  ...          NaN      90.1375
2021-10-19          NaN          NaN  ...          NaN      91.1125
2021-10-20          NaN          NaN  ...          NaN      93.5500

I wish to retrieve the 3 letters and dates to make them as index (to the long format), what is the most Pythonic/Pandas way?

This is roughly the desired effect with rows between the first 2 dates not shown

VALUE_TIME Group Date      Value                        
2016-07-07 AOC   2017Jan   NaN   
           AOC   2017Feb   NaN 
           ZTP   2021Oct   NaN
           ZTP   2021Nov   NaN
2021-10-14 ZTP   2021Nov  101.1000
2021-10-15 ZTP   2021Nov  88.6250
2021-10-18 ZTP   2021Nov  90.1375
2021-10-19 ZTP   2021Nov  91.1125
2021-10-20 ZTP   2021Nov  93.5500

BTW my final goal is to have a dataframe showing every possible spread between any product(AOC,ZTP etc) pairs which have the same value_time and Date

Final goal should looks like this

VALUE_TIME Spread    Date      Value                        
2016-07-07 AOC-BBC   2017Jan   xxx
           AOC-BBC   2017Feb   xxx
           AOC-ZTP   2017Jan   xxx
           AOC-ZTP   2017Feb   NaN 
           BBC-ZTP   2017Feb   NaN 
           BBC-ZTP   2017Feb   NaN 
2016-07-08 AOC-BBC   2017Jan   xxx
           AOC-BBC   2017Feb   xxx
           AOC-ZTP   2017Jan   xxx
           AOC-ZTP   2017Feb   NaN 
           BBC-ZTP   2017Feb   NaN 
           BBC-ZTP   2017Feb   NaN 

Solution

  • To answer your first question:

    df.columns=pd.MultiIndex.from_tuples([x.split('.') for x in df], names=['Group','Date'])
    
    df.stack(level=[0,1], dropna=False).to_frame(name='Value')
    

    You would get something like this:

                              Value
    VALUE_TIME Group Date          
    2016-07-07 AOC   2017Feb    NaN
                     2017Jan    NaN
                     2021Nov    NaN
                     2021Oct    NaN
               ZTP   2017Feb    NaN
                     2017Jan    NaN
                     2021Nov    NaN
                     2021Oct    NaN
    2021-10-14 AOC   2017Feb    NaN
                     2017Jan    NaN
                     2021Nov    NaN
                     2021Oct    NaN
               ZTP   2017Feb    NaN
                     2017Jan    NaN
                     2021Nov  101.1
                     2021Oct    NaN
    

    Your other question isn't very clear, however.