Search code examples
pythonpandasdataframepandas-loc

Does .loc in Python Pandas make inplace change on the original dataframe?


I was working on a dataframe like below:

df:

Site   Visits   Temp   Type
KFC    511      74     Food
KFC    565      77     Food
KFC    498      72     Food
K&G    300      75     Gas
K&G    255      71     Gas

I wanted to change 'Type' column into 0-1 variable so I could use df.corr() to check the correlation.

I tried two ways, one was to make a dictionary and make a new column:

dict = {'Food':1, 'Gas':0}
df['BinaryType'] = df['Type'].map(dict)

I was then able to use df.corr() to check correlation between 'Visits' and 'BinaryType'. Since 'Type' column contains strings, df.corr() would not show correlation between 'Visits' and 'Type'.

Second way was to use .loc:

df.loc[df['Type']=='Food','Type'] = 1
df.loc[df['Type']!=1,'Type'] = 0

Then I checked df in console, it was like below and it seemed an inplace change was made. I also checked the data type using df['Type'][0] and it read 1(I suppose it's integer):

Site   Visits   Temp   Type
KFC    511      74     1
KFC    565      77     1
KFC    498      72     1
K&G    300      75     0
K&G    255      71     0

Here however, df.corr() would not show correlation between 'Visits' and 'Type'! It was as if this column hadn't been changed.

You can use the code below to reproduce:

df = pd.DataFrame({
    'Site': {0: 'KFC', 1: 'KFC', 2: 'KFC', 3: 'K&G', 4:'K&G'},
    'Visits': {0: 511, 1: 565, 2: 498, 3: 300, 4:255},
    'Temp': {0: 74, 1: 77, 2: 72, 3: 75, 4:71},
    'Type': {0: 'Food', 1: 'Food', 2: 'Food', 3: 'Gas', 4:'Gas'}})
# 1
dict = {'Food':1, 'Gas':0}
df['BinaryType'] = df['Type'].map(dict)
df.corr()
del df['BinaryType']

# 2
df.loc[df['Type']=='Food','Type'] = 1
df.loc[df['Type']!=1,'Type'] = 0
df.corr()

Any idea on how Pandas .loc works on the background?


Solution

  • Your 2nd method doesn't actually change the dtype of the series even though the values are all ints. You can see that by doing df.dtypes which would show the Type column is still of object dtype

    You need to explicitly cast them to int using an .astype(int)

    OR

    use df['Type'] = np.where(df['Type'] == 'Food', 1, 0)

    running df.corr() after that gives

    In [22]: df.corr()
    Out[22]:
              Visits      Temp      Type
    Visits  1.000000  0.498462  0.976714
    Temp    0.498462  1.000000  0.305888
    Type    0.976714  0.305888  1.000000