I have a dataframe that looks similar to this:
df1 = pd.DataFrame({'product': ['widget1', 'widget1', 'widget1', 'widget2', 'widget2', 'widget3', 'widget3', 'widget4'],
'group': ['group1', 'group1', '', 'group1', '','group2', '', '']})
and want to output this:
df2 = pd.DataFrame({'product': ['widget1', 'widget1', 'widget1', 'widget2', 'widget2', 'widget3', 'widget3', 'widget4'],
'group': ['group1', 'group1', '', 'group1', '','group2', '', ''],
'new group': ['group1', 'group1', 'group1', 'group1', 'group1','group2', 'group2', '']})
So that a new column is created that:
Any help on this would be appreciated! Thanks!
Try this set of codes:
import numpy as np
df1['group'][df1['group'] == ''] = np.nan
df1['new group'] = df1.groupby('product')['group'].ffill().bfill()
df1[['group', 'new group']] = df1[['group', 'new group']].fillna('')
Output result:
product group new group
0 widget1 group1 group1
1 widget1 group1 group1
2 widget1 group1
3 widget2 group1 group1
4 widget2 group1
5 widget3 group2 group2
6 widget3 group2
7 widget4
The codes work as follows:
1st line: Temporarily set empty string in column group
to nan
2nd line: Group the rows by the same product
groups. Then for each group with same product
, look into the column group
and perform both forward fill and backward fill to fillup any nan
values with non-nan values anywhere in both directions within the group. The resulting Series of data (with null values now filled-up as far as possible) is copied to the new column new group
.
3rd line: reset the empty strings previously translated to nan
back to empty string. If other columns in your original dataframe does not contain nan
values, you can simplify this line of code to df1.fillna('')
Note that these codes assumed for every product, there can only be one possible value of group value (if any). If there can be 2 or above, the forward fill and backward fill operations may get different values depending on the positions of the multiple non-null group values.