The problem is with the resultant graph of function scipy.stats.probplot()
.
Samples from a normal distribution doesn't produce a line as expected.
I am trying to normalize some data using graphs as guidance.
However, after some strange results showing that zscore and log transformations were having no effect, I started looking for something wrong.
So, I built a graph using synthetic values that has a norm distribution and the resultant graph seems very awkward.
Here is the steps to reproduce the array and the graph:
import math
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
mu = 0
variance = 1
sigma = math.sqrt(variance)
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
norm = stats.norm.pdf(x, mu, sigma)
plt.plot(x, norm)
plt.show()
_ = stats.probplot(norm, plot=plt, sparams=(0, 1))
plt.show()
Distribution curve:
Probability plot:
Your synthesized data aren't normally distributed, they are uniformly distributed, this is what numpy.linspace()
does. You can visualize this by adding seaborn.distplot(x, fit=scipy.stats.norm)
.
import math
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import seaborn as sns
mu = 0
variance = 1
sigma = math.sqrt(variance)
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
y = stats.norm.pdf(x, mu, sigma)
sns.distplot(y, fit=stats.norm)
fig = plt.figure()
res = stats.probplot(y, plot=plt, sparams=(0, 1))
plt.show()
Try synthesizing your data with numpy.random.normal()
. This will give you normally distributed data.
import math
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
import seaborn as sns
mu = 0
variance = 1
sigma = math.sqrt(variance)
x = np.random.normal(loc=mu, scale=sigma, size=100)
sns.distplot(x, fit=stats.norm)
fig = plt.figure()
res = stats.probplot(x, plot=plt, sparams=(0, 1))
plt.show()