Search code examples
pythonpandasdataframeerror-handlingdivide-by-zero

New columns in Pandas Loop with ZeroDivisionError exception


I am trying to create some new columns in a dataframe which are ratios of existing columns:

df[e] = df[a]/df[b]
df[f] = df[c]/df[d]
df[g] = df[a]/df[d]
df[h] = df[b]/df[c]
...

Since some values in the columns are zeros, the code above raises the ZeroDivisionError. I tried to fix it manually with:

try:
    df[e] = df[a]/df[b]
except ZeroDivisionError:
    df[e] = np.nan
try:
    df[f] = df[c]/df[d]
except ZeroDivisionError:
    df[f] = np.nan
try:
    df[g] = df[a]/df[d]
except ZeroDivisionError:
    df[g] = np.nan
...

But with this code all the rows in the new columns are then np.nan instead of only those which would raise the ZeroDivisionError.

So, how could I do this correctly? Possibly while also using a for loop over the new columns without having to do it manually for each new column like I tried in the second code block.

Thank you very much!


Solution

  • Pandas should not raise a ValueError upon division by zero but rather define the value as NaN/inf:

    np.random.seed(42)
    df = pd.DataFrame(np.random.choice(range(3), size=(5,4)), columns=list('abcd'))
    df['e'] = df['a']/df['b']
    

    output:

       a  b  c  d    e
    0  2  0  2  2  inf
    1  0  0  2  1  NaN
    2  2  2  2  2  1.0
    3  0  2  1  0  0.0
    4  1  1  1  1  1.0
    

    Not that you can also perform all computations in one shot:

    np.random.seed(42)
    df = pd.DataFrame(np.random.choice(range(3), size=(5,4)), columns=list('abcd'))
    
    df.loc[:, ['e', 'f', 'g', 'h']] = df[['a', 'c', 'a', 'b']].div(df[['b', 'd', 'd', 'c']].values, axis=1).values
    

    output:

       a  b  c  d    e    f    g    h
    0  2  0  2  2  inf  1.0  1.0  0.0
    1  0  0  2  1  NaN  2.0  0.0  0.0
    2  2  2  2  2  1.0  1.0  1.0  1.0
    3  0  2  1  0  0.0  inf  NaN  2.0
    4  1  1  1  1  1.0  1.0  1.0  1.0