I have created a dataframe called df
with this code:
import numpy as np
import pandas as pd
# initialize data of lists.
data = {'Feature1':[1,2,-9999999,4,5],
'Age':[20, 21, 19, 18,34,]}
# Create DataFrame
df = pd.DataFrame(data)
print(df)
The dataframe looks like this:
Feature1 Age
0 1 20
1 2 21
2 -9999999 19
3 4 18
4 5 34
Every time there is a value of -9999999
in column Feature1
I need to replace it with the correspondent value from column Age
. so, the output dataframe would look this this:
Feature1 Age
0 1 20
1 2 21
2 19 19
3 4 18
4 5 34
Bear in mind that the actual dataframe that I am using has 200K records (the one I have shown above is just an example).
How do I do that in pandas?
You can use np.where
or Series.mask
df['Feature1'] = df['Feature1'].mask(df['Feature1'].eq(-9999999), df['Age'])
# or
df['Feature1'] = np.where(df['Feature1'].eq(-9999999), df['Age'], df['Feature1'])