Search code examples
pythonpandasnanmethod-chaining

Performing operations on column with nan's without removing them


I currently have a data frame like so:

treated control
9.5 9.6
10 5
6 0
6 6

I want to apply get a log 2 ratio between treated and control i.e log2(treated/control). However, the math.log2() ratio breaks, due to 0 values in the control column (a zero division). Ideally, I would like to get the log 2 ratio using method chaining, e.g a df.assign() and simply put nan's where it is not possible, like so:

treated control log_2_ratio
9.5 9.6 -0.00454
10 5 0.301
6 0 nan
6 6 0

I have managed to do this in an extremely round-about way, where I have:

  • made a column ratio which is treated/control
  • done new_df = df.dropna() on this dataframe
  • applied the log 2 ratio to this.
  • Left joined it back to it's the original df.

As always, any help is very much appreciated :)


Solution

  • You need to replace the inf with nan:

    df.assign(log_2_ratio=np.log2(df['treated'].div(df['control'])).replace(np.inf, np.nan))
    

    Output:

       treated  control  log_2_ratio
    0      9.5      9.6    -0.015107
    1     10.0      5.0     1.000000
    2      6.0      0.0          NaN
    3      6.0      6.0     0.000000