Search code examples
pythonpandasdataframenumpyfillna

How to shift a dataframe element-wise to fill NaNs?


I have a DataFrame like this:

>>> df = pd.DataFrame({'a': list('ABCD'), 'b': ['E',np.nan,np.nan,'F']})
   a    b
0  A    E
1  B  NaN
2  C  NaN
3  D    F

I am trying to fill NaN with values of the previous column in the next row and dropping this second row. In other words, I want to combine the two rows with NaNs to form a single row without NaNs like this:

   a  b
0  A  E
1  B  C
2  D  F

I have tried various flavors of df.fillna(method="<bfill/ffill>") but this didn't give me the expected output.

I haven't found any other question about this problem, Here's one. And actually that DataFrame is made from list of DataFrame by doing .concat(), you may notice that from indexes also. I am telling this because it may be easy to do in single row rather then in multiple rows.

I have found some suggestions to use shift, combine_first but non of them worked for me. You may try these too.

I also have found this too. It is a whole article about filling nan values but I haven't found problem/answer like mine.


Solution

  • OK misunderstood what you wanted to do the first time. The dummy example was a bit ambiguous.

    Here is another:

    >>> df = pd.DataFrame({'a': list('ABCD'), 'b': ['E',np.nan,np.nan,'F']})
       a    b
    0  A    E
    1  B  NaN
    2  C  NaN
    3  D    F
    

    To my knowledge, this operation does not exist with pandas, so we will use numpy to do the work.

    First transform the dataframe to numpy array and flatten it to be one-dimensional. Then drop NaNs using pandas.isna that is working on a larger range types than numpy.isnan, and then reshape the array to its original shape before transforming back to dataframe:

    array = df.to_numpy().flatten()
    pd.DataFrame(array[~pd.isna(array)].reshape(-1,df.shape[1]), columns=df.columns)
    

    output:

       a  b
    0  A  E
    1  B  C
    2  D  F
    

    It is also working for more complex examples, as long as the NaN pattern is conserved among columns with NaNs:

    In:
       a    b   c    d
    0  A    H  A2   H2
    1  B  NaN  B2  NaN
    2  C  NaN  C2  NaN
    3  D    I  D2   I2
    4  E  NaN  E2  NaN
    5  F  NaN  F2  NaN
    6  G    J  G2   J2
    
    Out:
       a   b   c   d
    0  A   H  A2  H2
    1  B  B2   C  C2
    2  D   I  D2  I2
    3  E  E2   F  F2
    4  G   J  G2  J2
    
    In:
       a    b    c
    0  A    F    H
    1  B  NaN  NaN
    2  C  NaN  NaN
    3  D  NaN  NaN
    4  E    G    I
    
    Out:
       a  b  c
    0  A  F  H
    1  B  C  D
    2  E  G  I
    

    In case NaNs columns do not have the same pattern such as:

       a    b   c    d
    0  A    H  A2  NaN
    1  B  NaN  B2  NaN
    2  C  NaN  C2   H2
    3  D    I  D2   I2
    4  E  NaN  E2  NaN
    5  F  NaN  F2  NaN
    6  G    J  G2   J2
    

    You can apply the operation per group of two columns:

    def elementwise_shift(df):
        array = df.to_numpy().flatten()
        return pd.DataFrame(array[~pd.isna(array)].reshape(-1,df.shape[1]), columns=df.columns)
    
    (df.groupby(np.repeat(np.arange(df.shape[1]/2), 2), axis=1)
       .apply(elementwise_shift)
    )
    

    output:

       a  b   c   d
    0  A  H  A2  B2
    1  B  C  C2  H2
    2  D  I  D2  I2
    3  E  F  E2  F2
    4  G  J  G2  J2