Search code examples
pythonpandaslambdapandas-loc

What's the difference between these 2 methods of Series filtering? (with or without lambda)


I have a data Series called Snow (the amount of snow in different months of the year).

These two lines of code produce the same results (at least seems so!)

So I just wanted to know the difference.

import pandas as pd

snow.loc[(snow.index.month==1) & (snow>0)]
snow.loc[lambda s: (s.index.month==1) & (s>0)]

Solution

  • There is no difference between the two lines provided you're not running chained commands. Using a function/lambda in loc is a way to ensure that you're referencing the current Series/DataFrame.

    It would be different with chained commands.

    Example:

    snow = pd.Series([0, 1, 0, 1], index=pd.to_datetime(['2023-01-01', '2023-01-15', '2023-02-01', '2023-02-15']))
    
    (snow
     .add(2)
     # here we reference the series independently
     # of the previous chained commands
     .loc[(snow.index.month==1) & (snow>0)]
    )
    
    # 2023-01-15    3
    # dtype: int64
    
    (snow
     .add(2)
     # here we reference the current state of the Series
     .loc[lambda s: (s.index.month==1) & (s>0)]
    )
    
    # 2023-01-01    2
    # 2023-01-15    3
    # dtype: int64