I have a DataFrame with a column that has some bad data with various negative values. I would like to replace values < 0 with the mean of the group that they are in.
For missing values as NAs, I would do:
data = df.groupby(['GroupID']).column
data.transform(lambda x: x.fillna(x.mean()))
But how to do this operation on a condition like x < 0
?
Thanks!
Using @AndyHayden's example, you could use groupby
/transform
with replace
:
df = pd.DataFrame([[1,1],[1,-1],[2,1],[2,2]], columns=list('ab'))
print(df)
# a b
# 0 1 1
# 1 1 -1
# 2 2 1
# 3 2 2
data = df.groupby(['a'])
def replace(group):
mask = group<0
# Select those values where it is < 0, and replace
# them with the mean of the values which are not < 0.
group[mask] = group[~mask].mean()
return group
print(data.transform(replace))
# b
# 0 1
# 1 1
# 2 1
# 3 2