Search code examples
pythonpandastabula

Shifting part of a row in a Dataframe to the right?


The dataframe in question is reading in from a pdf file using Tabula and getting some columns in the wrong places. It looks something like this:

Index  Name Date  Time  Exp   QT        Comm   Load  Notes
0      VT1  04/16 4:00  Glen  1600      Wheat  NaN   LTA/Book
1      VT2  04/16 4:00  Cof   16000     Wheat  NaN   Split/LTA
2      VT3  Glen  35100 Wheat LTA       NaN    NaN   NaN
3      VT4  Glen  18416 Wheat Split/LTA NaN    Nan   NaN

I'm not sure why it comes out this way, but the issue is obvious that in the final rows of this dataframe the formatting is no longer reading in correctly and begins to ignore empty cells and move the data leftward. What I'd like to do is move rows 2 and 3 so they are correctly oriented to the columns. The end result should be:

Index  Name Date  Time  Exp   QT        Comm   Load  Notes
0      VT1  04/16 4:00  Glen  1600      Wheat  NaN   LTA/Book
1      VT2  04/16 4:00  Cof   16000     Wheat  NaN   Split/LTA
2      VT3  NaN   NaN   Glen  35100     Wheat  NaN   LTA      
3      VT4  NaN   NaN   Glen  18416     Wheat  NaN   Split/LTA

I can't think of a way to do this that doesn't include something over-complicated and inefficient.


Solution

  • I think you have to mask all rows that meet the condition.

    In this case if ['Comm', 'Load', 'Notes'] always NaN.

    mask = df[['Comm', 'Load', 'Notes']].isna().all(axis=1)
    

    Then use .shift and assign it back to the dataframe.

    df.loc[mask, 'Date':'Notes'] = df.loc[mask, 'Date':'Notes'].shift(2, axis=1)
    
    df
    
       Index Name   Date  Time   Exp     QT   Comm  Load      Notes
    0      0  VT1  04/16  4:00  Glen   1600  Wheat   NaN   LTA/Book
    1      1  VT2  04/16  4:00   Cof  16000  Wheat   NaN  Split/LTA
    2      2  VT3    NaN   NaN  Glen  35100  Wheat   NaN        LTA
    3      3  VT4    NaN   NaN  Glen  18416  Wheat   NaN  Split/LTA