Search code examples
pandasdataframepandas-loc

Pandas ".loc" method not working as expected


Why is this working?

import pandas as pd

numbers = {'mynumbers': [51, 52, 53, 54, 55]}
df = pd.DataFrame(numbers, columns =['mynumbers'])
df.loc[df['mynumbers'] <= 53, 'mynumbers'] = 'True'
print (df)

Output:

  mynumbers
0      True
1      True
2      True
3     False
4     False

But this returns an error:

import pandas as pd

numbers = {'mynumbers': [51, 52, 53, 54, 55]}
df = pd.DataFrame(numbers, columns =['mynumbers'])
print(df.loc[df['mynumbers']])

If in the first case I can use the "df.loc[df['mynumbers']]" statement as a conditional to compare values, why do I get an error when I simply try to print out the statement alone?

I understand that the index values that I pass into the .loc method yield a key error because there is no such key exist, but I do not understand that why does it works in the first instance?


Solution

  • When you do df['mynumbers'] <= 53 you use a boolean indexer, that is a series that has the same index as df and either True or False as values:

    >>> df['mynumbers'] <= 53
    0     True
    1     True
    2     True
    3    False
    4    False
    Name: mynumbers, dtype: bool
    

    This can be passed to df.loc[] or df[]:

    >>> df[df['mynumbers'] <= 53]
       mynumbers
    0         51
    1         52
    2         53
    >>> df.loc[df['mynumbers'] <= 53, :]
       mynumbers
    0         51
    1         52
    2         53
    

    The other way to use df.loc[] is to pass in index values:

    >>> df.loc[df.index]
       mynumbers
    0         51
    1         52
    2         53
    3         54
    4         55
    >>> df.loc[df.index[3:]]
       mynumbers
    3         54
    4         55
    >>> df.loc[[1, 2]]
       mynumbers
    1         52
    2         53
    

    However when you do df.loc[df['mynumbers']] you’re doing none of those 2 options. It’s trying to find the object df['mynumbers'] in the index, as shown by the following error, and that doesn’t work:

    KeyError: "None of [Int64Index([51, 52, 53, 54, 55], dtype='int64')] are in the [index]"