Search code examples
pythonpandassumcumsum

Setting pandas global default for skipna to False


For certain Pandas functions, such as sum(), cumsum() and cumprod(), there is an option for skipna which is set to True by default. This causes issues for me as errors might silently propagate so I always explicitly set skipna to False.

sum_df = df.sum(skipna=False)

Doing it every time one of these functions appear makes the code look a bit unwieldy. Is there a way I can change the default behaviour in Pandas?


Solution

  • Option is not an option (yet)

    It seems there is nothing such an option to control this behaviour. It is hard coded:

    import inspect
    inspect.getfile(pd.DataFrame.sum)    # './pandas/core/generic.py'
    inspect.getsource(pd.DataFrame.sum)
    
    # @Substitution(outname=name, desc=desc, name1=name1, name2=name2,
    #                  axis_descr=axis_descr, min_count=_min_count_stub,
    #                  see_also=see_also, examples=examples)
    # @Appender(_num_doc)
    # def stat_func(self, axis=None, skipna=None, level=None, numeric_only=None,
    # [...]
    

    It could be a good idea for pull request.

    A simple solution

    Probably not the best solution, it is a bit hackish but it does address your problem.

    I am not saying that it is a good practice in general. It may have drawbacks that I have not addressed (you are welcome to list it in comment). Anyway this solution has the advantage to be non intrusive.

    Additionally, although it is a quite simple technique and it is pure PSL, it may violate Principle Of Least Astonishment (see this answer for details).

    MCVE

    Lets build a wrapper that overrides existing default parameters or add extra parameters:

    def set_default(func, **default):
        def inner(*args, **kwargs):
            kwargs.update(default)        # Update function kwargs w/ decorator defaults
            return func(*args, **kwargs)  # Call function w/ updated kwargs
        return inner                      # Return decorated function
    

    Then, we can decorate any function. For instance:

    import pandas as pd
    pd.DataFrame.sum = set_default(pd.DataFrame.sum, skipna=False)
    

    Then, the sum method of DataFrame object has its skipna overridden to False each time we call it. Now the following code:

    import numpy as np
    df = pd.DataFrame([1., 2., np.nan])
    df.sum()
    

    Returns:

    0   NaN
    dtype: float64
    

    Instead of:

    0    3.0
    dtype: float64
    

    Automation

    We can apply this modification to many functions, at once:

    for key in ['sum', 'mean', 'std']:
        setattr(pd.DataFrame, key, set_default(getattr(pd.DataFrame, key), skipna=False))
    

    If we store those modifications into a python module (.py file) they will be applied at the import time without having the need to modify the Pandas code itself.