Search code examples
pythonpandasmutable

Pandas dataframe mutability with loc method


I am trying to understand the inticacies of using loc on a dataframe. Suppose we have the following:

df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
df2 = df.loc[:,'a']
df2.loc[0] = 10
print(df)
print(df2)

    a  b
0  10  4
1   2  5
2   3  6
0    10
1     2
2     3
Name: a, dtype: int64


df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
df3 = df.loc[:,['a']]
df3.loc[0] = 10
print(df)
print(df3)

   a  b
0  1  4
1  2  5
2  3  6
    a
0  10
1   2
2   3

Why does the first piece of code modify the original dataframe, whereas the second does not?


Solution

  • Because in your first code, df2 is a view of df:

    df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
    df2 = df.loc[:,'a']
    
    df2._is_view
    # True
    

    Use copy to ensure having a copy:

    df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
    df2 = df.loc[:,'a'].copy()
    
    df2._is_view
    # False
    

    why?

    Because in the first case the slice is a Series (1D object) and in the second a DataFrame (2D):

    df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})
    
    df.loc[:,'a'].shape
    # (3,)  -> this is 1D (Series)
    df.loc[:,'a'].ndim
    # 1
    
    df.loc[:,['a']].shape
    # (3,1) -> this is 2D (DataFrame)
    df.loc[:,['a']].ndim
    # 2