I am trying to figure this out. Assuming the variable mile is sampled from a population which is normally distributed, how would I plot the theoretical normal distribution given an estimate of the mean and variance?
data <- read.csv("data.csv", sep = "\t", header = TRUE)
data
name mile
1 dat1 5039
2 dat1 2883
3 dat2 135
4 dat2 104
5 dat3 32
6 dat3 192
I have got the mean and variance calculated for mile as below:
mean(data$mile)
[1] 1397.5
var(data$mile)
[1] 4410420
But I am unsure if this is what is even asked. Has anyone had any dealings with a question like this before? Any help would be greatly appreciated.
Update
pdf_norm <- function(x,mu,sigma){
1/(sqrt(2*pi*sigma^2))*exp(-(x - mu)^2/(2*sigma^2))
}
mu <- 1397.5
sigma <- 4410420
x <- seq(mu-3*sigma, mu+3*sigma,length.out = 100) # empirical rule 3 sigma rule
d <- pdf_norm(x, mu,sigma)
plot(x,d, xlab = "X", ylab = "density")
To here I get the output of ...
I have tried using the code below to superimpose a histogram onto to the above plot...
hist(data$mile, add = T)
But this results as ...
Which obviously isn't right. Can anyone help?
If you need to plot the theoretical distribution, you need to define its PDF first (for example, you can find the formula here):
pdf_norm <- function(x,mu,sigma){
1/(sqrt(2*pi*sigma^2))*exp(-(x - mu)^2/(2*sigma^2))
}
Here, x
is the random variable, mu
is the mean, and sigma
is the standard deviation.
After that, you can proceed to plotting. Set mu
and sigma
to your estimates and evaluate the PDF. The range is chosen using the three-sigma rule.
mu <- 1397.5
sigma <- 4410420
x <- seq(mu-3*sigma, mu+3*sigma,length.out = 100) # empirical rule 3 sigma rule
d <- pdf_norm(x, mu,sigma)
plot(x,d, xlab = "X", ylab = "density")
You can also make sure that the PDF approximately integrates to 1:
integrate(function(x) pdf_norm(x, mu, sigma), mu-3*sigma, mu+3*sigma)