I have a dataframe (pandas) that I want to transform for displaying purposes. Therefore I want to shift some parts of the dataframe to new rows like below :
col1 col2 col_to_shift col_not_to_shift1 col_not_to_shift2
0 a1 a2 a3 a4 a5
1 b1 b2 NaN b4 NaN
2 c1 c2 c3 c4 c5
I would like to obtain the following dataframe where a new row is created for the value (if not NaN
) in the column to shift (unique), duplicating the data description contained in col1
and col2
and keeping the columns that I don't want to shift even if they contain NaNs:
col1 col2 col_to_shift col_not_to_shift1 col_not_to_shift2
0 a1 a2 a3 NaN NaN
1 a1 a2 NaN a4 a4
2 b1 b2 NaN b4 NaN
3 c1 c2 c3 NaN NaN
4 c1 c2 NaN c4 c4
I tried looking at pd.shift
but was not able to make it work.
Here is a piece of code to produce the dataframe :
data = {"col1": ['a1', 'b1', 'c1'], 'col2': ['a2', 'b2', 'c2'],
'col_to_shift': ['a3', np.NaN, 'c3'],
'col_not_to_shift1': ['a4', 'b4', 'c4'],
'col_not_to_shift2': ['a5', np.NaN, 'c5']}
df = pd.DataFrame(data)
One option using concat
:
dropna
sort_values
and a stable sort algorithmgroup = ['col1', 'col2']
cols = df.columns[df.columns.str.contains('not_to_shift')]
out = (pd.concat([df.drop(columns=cols).dropna(axis=0),
df[cols.union(group)].dropna(axis=0)
])
.sort_values(by=group, kind='stable')
)
Output:
col1 col2 col_to_shift col_not_to_shift
0 a1 a2 a3 NaN
0 a1 a2 NaN a4
1 b1 b2 NaN b4
2 c1 c2 c3 NaN
2 c1 c2 NaN c4
Variant using the index as group (this does not maintain the original index):
group = ['col1', 'col2']
not_shift = ['col_not_to_shift']
tmp = df.set_index(group)
out = (pd.concat([tmp.drop(columns=not_shift), tmp[not_shift]])
.dropna(how='all').sort_index(kind='stable')
.reset_index()
)
Or by defining the list of columns to shift/not-shift:
shift = ['col_to_shift']
not_shift = ['col_not_to_shift1', 'col_not_to_shift2']
out = (pd.concat([df.drop(columns=not_shift),
df.drop(columns=shift)
])
.dropna(subset=shift+not_shift, how='all')
.sort_index(kind='stable')
)
Output:
col1 col2 col_to_shift col_not_to_shift1 col_not_to_shift2
0 a1 a2 a3 NaN NaN
0 a1 a2 NaN a4 a5
1 b1 b2 NaN b4 NaN
2 c1 c2 c3 NaN NaN
2 c1 c2 NaN c4 c5