python pandas normalization valueerror normal-distribution

Log transformation-ValueError: cannot convert float NaN to integer

The data of some columns don't follow normal distribution and I wanted to normalize them by using log transformation.

fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(14,6))
#1
sns.distplot(train_df['MasVnrArea'], fit=stats.norm, ax=ax[0])
ax[0].set_title('Before Normalization')

#2
train_df['MasVnrArea'] = np.log(train_df['MasVnrArea'])
ax[1].set_title('After Normalization')
sns.distplot(train_df['MasVnrArea'], fit=stats.norm, ax=ax[1])

Part #1 works fine, but when it comes to part #2 it gives me this error:

ValueError: cannot convert float NaN to integer

I already check if there was a NaN value in this column, but there was nothing. So what's the problem with it?

Solution

When did you check if there are NaN values?

Did you check if train_df['MasVnrArea'] have values equal or under 0? If there are values equal to or under 0, the log return NaN and the plot in the next line will throw the error.

Check again if there are NaN values after the log calculation.

Example from Using numpy.log() on 0

import numpy as np 
print(np.log(0))

Output:

-inf 
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: RuntimeWarning: divide by zero encountered in log

Explanation:

The logarithm of zero is not defined. It’s not a real number, because you can never get zero by raising anything to the power of anything else.