Search code examples
pythonpandasdataframedivide-by-zero

Python avoid dividing by zero in pandas dataframe


Apologies that this has been asked before, but I cannot get those solutions to work for me (am native MATLAB user coming to Python).

I have a dataframe where I am taking the row-wise mean of the first 7 columns of one df and dividing it by another. However, there are many zeros in this dataset and I want to replace the zero divion errors with zeros (as that's meaningful to me) instead of the naturally returned nan (as I'm implementing it).

My code so far:

col_ind = list(range(0,7))
df.iloc[:,col_ind].mean(axis=1)/other.iloc[:,col_ind].mean(axis=1)

Here, if other = 0, it returns nan, but if df = 0 it returns 0. I have tried a lot of proposed solutions but none seem to register. For instance:

def foo(x,y):
    try:
        return x/y
    except ZeroDivisionError:
        return 0

foo(df.iloc[:,col_ind].mean(axis1),other.iloc[:,col_ind].mean(axis=1))

However this returns the same values without using the defined foo. I'm suspecting this is because I am operating on series rather than single values, but I'm not sure nor how to fix it. There are also actual nans in these dataframes as well. Any help appreciated.


Solution

  • you can use np.where to conditionally do this as a vectorised calc.

    import numpy as np
    
    df = pd.DataFrame(data=np.concatenate([np.random.randint(1,10, (10,7)), np.random.randint(0,3,(10,1))], axis=1),
                columns=[f"col_{i}" for i in range(7)]+["div"])
    
    np.where(df["div"].gt(0), (df.loc[:,[c for c in df.columns if "col" in c]].mean(axis=1) / df["div"]), 0)