Search code examples
pandasnormalizationlogarithm

When i log transform pandas column i get NaNs should i replace these with 0?


I cannot find a similar question. But i have a df with some columns highly skewed. I then plan to log transform these columns then standardize. However when i log transform i then get NaNs, should i replace these with 0;s?

log_train[skew_cols]=np.log2(featuresdf[skew_cols]

error i get is:

RuntimeWarning: invalid value encountered in log2
  This is separate from the ipykernel package so we can avoid doing imports until

not sure what i am doing wrong


Solution

  • You shouldn't replace with 0's, because np.log(1) is equal to 0. So then both 1, and 0 will be 0 in your log data.

    Instead, just +1 your data prior to the log. Therefore log2(1) becomes 0, log2(2) (which was 1) is still 1, then log2(3) (which was 2) is now 1.58)

    So the code would be:

    log_train[skew_cols]=np.log2(featuresdf[skew_cols]+1)
    

    The other option is to use other scaling methods that can handle 0, such as square root (np.sqrt)