Search code examples
pythonpandasdataframedata-cleaning

Pandas column reshaping: aligning the values to the left (ignoring the outside zeros)


I have a set of data, where I predict the amount of fuel I need around 10 weeks ahead. I have it all set up in a single dataframe presented as staircase date. This means, the closer I come to the last entry for a week the more accurate the values get. I want to cut all missing values and ignore the exact date so I can just look at my predictions in relation to the distance of the predicted week.

Input dataframe:

Index 2020-01   2020-02  2020-03  2020-04  2020-05  2020-06
1.         10        10        5        0        0        0
2.          0         5        5       10        0        0
3.          0         0       10        4        3        0
4.          0         0        0        1        7        6

Outcome should be:

Index      W1        W2      W3
1.         10        10       5     
2.          5        5       10   
3.         10        4        3        
4.          1        7        6

Many Thanks in advance


Solution

  • You could replace the zeros with NaNs and reset the Series per row:

    df2 = (
     df.replace(0,float('nan'))
       .apply(lambda s: s.dropna().reset_index(drop=True), axis=1)
       .astype(int) 
    )
    
    df2.columns = df2.columns.map(lambda x: f'W{x+1}')
    

    output:

         W1  W2  W3
    1.0  10  10   5
    2.0   5   5  10
    3.0  10   4   3
    4.0   1   7   6
    

    ensuring that only the outer 0s are removed

    I set 2./2020-03 to 0 for this example

    You can use ffill+bfill to compute a mask:

    m = df.ne(0)
    m1 = m.where(m).bfill(axis=1)
    m2 = m.where(m).ffill(axis=1)
    
    df2 = (
     df.where(m1&m2)  # internal 0s are those True for both ffill/bfill
       .apply(lambda s: s.dropna().reset_index(drop=True), axis=1)
       .astype(int) 
    )
    
    df2.columns = df2.columns.map(lambda x: f'W{x+1}')
    

    output:

         W1  W2  W3
    1.0  10  10   5
    2.0   5   0  10
    3.0  10   4   3
    4.0   1   7   6