The dataframe in question is reading in from a pdf file using Tabula and getting some columns in the wrong places. It looks something like this:
Index Name Date Time Exp QT Comm Load Notes
0 VT1 04/16 4:00 Glen 1600 Wheat NaN LTA/Book
1 VT2 04/16 4:00 Cof 16000 Wheat NaN Split/LTA
2 VT3 Glen 35100 Wheat LTA NaN NaN NaN
3 VT4 Glen 18416 Wheat Split/LTA NaN Nan NaN
I'm not sure why it comes out this way, but the issue is obvious that in the final rows of this dataframe the formatting is no longer reading in correctly and begins to ignore empty cells and move the data leftward. What I'd like to do is move rows 2 and 3 so they are correctly oriented to the columns. The end result should be:
Index Name Date Time Exp QT Comm Load Notes
0 VT1 04/16 4:00 Glen 1600 Wheat NaN LTA/Book
1 VT2 04/16 4:00 Cof 16000 Wheat NaN Split/LTA
2 VT3 NaN NaN Glen 35100 Wheat NaN LTA
3 VT4 NaN NaN Glen 18416 Wheat NaN Split/LTA
I can't think of a way to do this that doesn't include something over-complicated and inefficient.
I think you have to mask all rows that meet the condition.
In this case if ['Comm', 'Load', 'Notes']
always NaN
.
mask = df[['Comm', 'Load', 'Notes']].isna().all(axis=1)
Then use .shift
and assign it back to the dataframe.
df.loc[mask, 'Date':'Notes'] = df.loc[mask, 'Date':'Notes'].shift(2, axis=1)
df
Index Name Date Time Exp QT Comm Load Notes
0 0 VT1 04/16 4:00 Glen 1600 Wheat NaN LTA/Book
1 1 VT2 04/16 4:00 Cof 16000 Wheat NaN Split/LTA
2 2 VT3 NaN NaN Glen 35100 Wheat NaN LTA
3 3 VT4 NaN NaN Glen 18416 Wheat NaN Split/LTA