i have seen a lot of documentation about normal distribution and curve sketching in python and i am a bit confused about it, i Have generated normal random variables with mean 30 and standard deviation 3.7 , and using function norm.dist i have estimated pdf function
=NORM.DIST(A2,$H$2,$I$2,FALSE)
on the based of this formula, i sketched scatter chart and i have got
i want for demonstration purpose sketch the same using python, i found scipy and numpy version, please help me clarify things clearly, here are some set of my numbers
i have tried following code
from scipy.stats import norm
import pandas as pd
import matplotlib.pyplot as plt
data_random =pd.read_excel("data_for_normal.xlsx")
data_values =data_random["NormalVariables"].values
pdf_values =norm.pdf(data_values,30,3.7)
plt.plot(data_values,pdf_values)
plt.title("normal curve")
plt.xlabel("x values")
plt.ylabel("probability density function")
plt.show()
result of :
print(data_random.head(10))
NormalVariables
0 29.214494
1 30.170595
2 36.014144
3 30.388626
4 28.398749
5 24.861042
6 29.519316
7 24.207164
8 35.779376
9 26.042977
# plt.plot connects datapoints with lines:
x = [0,1,2]
y = [1,4,3]
plt.plot(x,y)
#note that lines are drawn between adjacent elements in the list,
#so a line from (0,1) to (1,4) and then to (2,3)
# if the order of the datapoints is changed, the position of the datapoints
# remains unchanged, but now lines are drawn between different points
x = [2,0,1]
y = [3,1,4]
plt.plot(x,y)
So the reason you see all the crisscrossing in your plot is that you plot unsorted data.
If you simply want to replicate the plot from excel, use plt.scatter
instead. This plot just the datapoints and does not draw connections between them.
x = [2,0,1]
y = [3,1,4]
plt.scatter(x,y)