Search code examples
pythonpandasdataframewhere-clausemask

How to change values in column and generate a new DataFrame in python


I have a DataFrame and I want to genera new one changing values from just one column, and keep original dataframe intact.I have try with mask, where and iloc, but the original data frame always change.

import pandas as pd

data = {
  "age": [50, 40, 30, 40, 20, 10, 30],
  "qualified": [True, False, False, False, False, True, True]
}
df = pd.DataFrame(data)

newdf = df
newdf["age"] = newdf.where(newdf["age"] > 30,2)

print(newdf)
print(df)

Result:

age  qualified
0  50       True
1  40      False
2   2      False
3  40      False
4   2      False
5   2       True
6   2       True
  age  qualified
0  50       True
1  40      False
2   2      False
3  40      False
4   2      False
5   2       True
6   2       True

Is there some way to change this values and keep the original?


Solution

  • Use df.copy(deep=True) What is the difference between a deep copy and a shallow copy?

    import pandas as pd
    import numpy as np
    
    data = {
      "age": [50, 40, 30, 40, 20, 10, 30],
      "qualified": [True, False, False, False, False, True, True]
    }
    df = pd.DataFrame(data)
    
    # deep copy
    newdf = df.copy(deep=True)
    
    
    newdf["age"] = np.where(newdf["age"] > 30, newdf["age"], 2)
    print(newdf)
       age  qualified
    0   50       True
    1   40      False
    2    2      False
    3   40      False
    4    2      False
    5    2       True
    6    2       True
    
    print(df)
       age  qualified
    0   50       True
    1   40      False
    2   30      False
    3   40      False
    4   20      False
    5   10       True
    6   30       True