Search code examples
pythonpandasreplacemean

How to replace irrelevant data into mean values?


Let's say I have 600,000 data points in column for age. In the data there are values 0 and -1, which is not relevant for age. How can I change both 0 and -1 values in my data to the column mean value using python?

The code so far:

df6 = df5['Vict Age'].replace([0, -1]).mean())
df6.update(df5)
df6

Solution

  • You can find the mean separatly and then use the correct replace syntax to replace desired values:

    # Calculate mean ignoring -1, 0 values
    age_mean = df5['Vict Age'][~df5['Vict Age'].isin([-1,0])].mean()
    # Replace -1, 0 values
    df5['Vict Age'] = df5['Vict Age'].replace({0: age_mean , -1: age_mean})
    

    PS: Please use Stack Overflow code formatting instead of posting the image in the future. Thanks.