Search code examples
pythonpandasindexingalignment

Aligning Pandas columns Dataframe by Dates


The dataframe is as follow (3 items...original has hundreds):

                    Log1     Date2      Log2    Date3       Log3
           Date1
          01.01.2000 1000     02.01.2000 2000    01.01.2000  3000
          02.01.2000 1050     03.01.2000 1950    02.01.2000  3020
          03.01.2000 1100     04.01.2000 2000    03.01.2000  3000

The desired outlook, aligning the dates:

                   Log1  Log2  Log3  
          Date     
      01.01.2000 1,000   nan 3,000
      02.01.2000 1,050 2,000 3,020
      03.01.2000 1,100 1,950 3,000
      04.01.2000   nan 2,000   nan
  • Question: How to align by dates?

Short example of dataframe:

                BBAS3   Data.1      PETR4   Data.2     TRSD      Data.3   JKHD
    Data                            
    2020-10-05  30.15   2020-10-05  19.91   2020-10-05  30.15   2020-10-05  19.91
    2020-10-02  29.71   2020-10-02  19.02   2020-10-02  29.71   2020-10-01  19.85
    2020-10-01  29.79   2020-10-01  19.85   2020-10-01  29.79   2020-09-30  19.61
    2020-09-30  29.62   2020-09-30  19.61   2020-09-30  29.62   2020-09-29  19.31
    2020-09-29  29.76   2020-09-29  19.31   2020-09-29  29.76   2020-09-28  19.63

Solution

  • One idea if input data has DatetimeIndex loop by unpair and pairs columns names, create Series and concat together:

    #convert Datetimeindex to column
    df1 = df.reset_index()
    
    zipped = zip(df1.columns[::2], df1.columns[1::2])
    df1 = pd.concat([df1.set_index(a)[b] for a, b in zipped], axis=1)
    df1.index = pd.to_datetime(df1.index)
    df1 = df1.sort_index()
    
    print (df1)
                BBAS3  PETR4   TRSD   JKHD
    2020-09-28    NaN    NaN    NaN  19.63
    2020-09-29  29.76  19.31  29.76  19.31
    2020-09-30  29.62  19.61  29.62  19.61
    2020-10-01  29.79  19.85  29.79  19.85
    2020-10-02  29.71  19.02  29.71    NaN
    2020-10-05  30.15  19.91  30.15  19.91
    

    EDIT:

    #sample data generate error - because duplicated dates in soem column like here in Data
    print (df)
                 BBAS3      Data.1  PETR4      Data.2   TRSD      Data.3   JKHD
    Data                                                                       
    2020-10-05  200.00  2020-10-05  19.91  2020-10-05  30.15  2020-10-05  19.91
    2020-10-05  100.00  2020-10-02  19.02  2020-10-02  29.71  2020-10-01  19.85
    2020-10-01   29.79  2020-10-01  19.85  2020-10-01  29.79  2020-09-30  19.61
    2020-09-30   29.62  2020-09-30  19.61  2020-09-30  29.62  2020-09-29  19.31
    2020-09-29   29.76  2020-09-29  19.31  2020-09-29  29.76  2020-09-28  19.63
    
    df1 = df.reset_index()
    
    zipped = zip(df1.columns[::2], df1.columns[1::2])
    df1 = pd.concat([df1.groupby(a)[b].sum() for a, b in zipped], axis=1)
    df1.index = pd.to_datetime(df1.index)
    df1 = df1.sort_index()
    
    print (df1)
                 BBAS3  PETR4   TRSD   JKHD
    2020-09-28     NaN    NaN    NaN  19.63
    2020-09-29   29.76  19.31  29.76  19.31
    2020-09-30   29.62  19.61  29.62  19.61
    2020-10-01   29.79  19.85  29.79  19.85
    2020-10-02     NaN  19.02  29.71    NaN
    2020-10-05  300.00  19.91  30.15  19.91