The data of some columns don't follow normal distribution and I wanted to normalize them by using log transformation.
fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(14,6))
#1
sns.distplot(train_df['MasVnrArea'], fit=stats.norm, ax=ax[0])
ax[0].set_title('Before Normalization')
#2
train_df['MasVnrArea'] = np.log(train_df['MasVnrArea'])
ax[1].set_title('After Normalization')
sns.distplot(train_df['MasVnrArea'], fit=stats.norm, ax=ax[1])
Part #1
works fine, but when it comes to part #2
it gives me this error:
ValueError: cannot convert float NaN to integer
I already check if there was a NaN value in this column, but there was nothing. So what's the problem with it?
When did you check if there are NaN values?
Did you check if train_df['MasVnrArea']
have values equal or under 0?
If there are values equal to or under 0, the log return NaN and the plot in the next line will throw the error.
Example from Using numpy.log() on 0
import numpy as np
print(np.log(0))
Output:
-inf
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in log
Explanation:
The logarithm of zero is not defined. It’s not a real number, because you can never get zero by raising anything to the power of anything else.