I cannot find a similar question. But i have a df with some columns highly skewed. I then plan to log transform these columns then standardize. However when i log transform i then get NaNs, should i replace these with 0;s?
log_train[skew_cols]=np.log2(featuresdf[skew_cols]
error i get is:
RuntimeWarning: invalid value encountered in log2
This is separate from the ipykernel package so we can avoid doing imports until
not sure what i am doing wrong
You shouldn't replace with 0's, because np.log(1) is equal to 0. So then both 1, and 0 will be 0 in your log data.
Instead, just +1 your data prior to the log. Therefore log2(1) becomes 0, log2(2) (which was 1) is still 1, then log2(3) (which was 2) is now 1.58)
So the code would be:
log_train[skew_cols]=np.log2(featuresdf[skew_cols]+1)
The other option is to use other scaling methods that can handle 0, such as square root (np.sqrt)