Using method chaining, I'm looking to create a new column using the value of col_2
when col_1==0
.
np.random.seed(1)
df = pd.DataFrame({'group':list('AAABBBCCDDDD'),
'col_1': [-1,0,1,-1,0,1,0,1,-1,0,1,2],
'col_2': np.random.randint(0, 10, 12)})
group col_1 col_2
0 A -1 5
1 A 0 8
2 A 1 9
3 B -1 5
4 B 0 0
5 B 1 0
6 C 0 1
7 C 1 7
8 D -1 6
9 D 0 9
10 D 1 2
11 D 2 4
Desired output:
group col_1 col_2 new_col
0 A -1 5 8
1 A 0 8 8
2 A 1 9 8
3 B -1 5 0
4 B 0 0 0
5 B 1 0 0
6 C 0 1 1
7 C 1 7 1
8 D -1 6 9
9 D 0 9 9
10 D 1 2 9
11 D 2 4 9
My approach using groupby transform
(I would love for this to work but apparently transform only accesses single columns ):
df.assign(
new_col = lambda df_: df_.groupby('group').transform(lambda x: x.loc[x.col_1==0].col_2)
)
AttributeError: 'Series' object has no attribute 'col_1'
Came up with this solution while writing this question but thought I'd post anyways:
df.assign(
new_col = lambda df_: df_.merge(df.groupby('group')
.apply(lambda x: x.loc[x.col_1==0].col_2)
.reset_index().rename(columns={'col_2':'new_col'}), on='group'
).new_col
)
Is there a better way?
Use Series.where
for replace all col_2
values if not match col_1 ==1
and then use GroupBy.first
for first not NaN
value:
df = df.assign(
new_col = lambda df_: df_['col_2'].where(df_['col_1'] == 0)
.groupby(df_['group']).transform('first')
)
print (df)
group col_1 col_2 new_col
0 A -1 5 8.0
1 A 0 8 8.0
2 A 1 9 8.0
3 B -1 5 0.0
4 B 0 0 0.0
5 B 1 0 0.0
6 C 0 1 1.0
7 C 1 7 1.0
8 D -1 6 9.0
9 D 0 9 9.0
10 D 1 2 9.0
11 D 2 4 9.0
Another solution with Series.map
filtered rows by condition with DataFrame.set_index
for index by group
column:
df = df.assign(
new_col = lambda df_: df_['group'].map(df_.loc[df_['col_1'] == 0]
.set_index('group')['col_2'])
)
print (df)
group col_1 col_2 new_col
0 A -1 5 8
1 A 0 8 8
2 A 1 9 8
3 B -1 5 0
4 B 0 0 0
5 B 1 0 0
6 C 0 1 1
7 C 1 7 1
8 D -1 6 9
9 D 0 9 9
10 D 1 2 9
11 D 2 4 9