Search code examples
rplotnormal-distribution

How would I plot the theoretical normal distribution given an estimate of the mean and variance of a variable?


I am trying to figure this out. Assuming the variable mile is sampled from a population which is normally distributed, how would I plot the theoretical normal distribution given an estimate of the mean and variance?

data <- read.csv("data.csv", sep = "\t", header = TRUE)
data

   name    mile
1  dat1    5039
2  dat1    2883
3  dat2    135
4  dat2    104
5  dat3    32
6  dat3    192

I have got the mean and variance calculated for mile as below:

mean(data$mile)
[1] 1397.5

var(data$mile)
[1] 4410420

But I am unsure if this is what is even asked. Has anyone had any dealings with a question like this before? Any help would be greatly appreciated.

Update

pdf_norm <- function(x,mu,sigma){
  1/(sqrt(2*pi*sigma^2))*exp(-(x - mu)^2/(2*sigma^2))
}

mu <- 1397.5
sigma <- 4410420
x <- seq(mu-3*sigma, mu+3*sigma,length.out = 100) # empirical rule 3 sigma rule
d <- pdf_norm(x, mu,sigma)

plot(x,d, xlab = "X", ylab = "density")

To here I get the output of ...

Theoretical Normal Distribution

I have tried using the code below to superimpose a histogram onto to the above plot...

hist(data$mile, add = T)

But this results as ...

Superimposing a histogram

Which obviously isn't right. Can anyone help?


Solution

  • If you need to plot the theoretical distribution, you need to define its PDF first (for example, you can find the formula here):

    pdf_norm <- function(x,mu,sigma){
      1/(sqrt(2*pi*sigma^2))*exp(-(x - mu)^2/(2*sigma^2))
    }
    

    Here, x is the random variable, mu is the mean, and sigma is the standard deviation.

    After that, you can proceed to plotting. Set mu and sigma to your estimates and evaluate the PDF. The range is chosen using the three-sigma rule.

    mu <- 1397.5
    sigma <- 4410420
    x <- seq(mu-3*sigma, mu+3*sigma,length.out = 100) # empirical rule 3 sigma rule
    d <- pdf_norm(x, mu,sigma)
    
    plot(x,d, xlab = "X", ylab = "density")
    

    You can also make sure that the PDF approximately integrates to 1:

    integrate(function(x) pdf_norm(x, mu, sigma), mu-3*sigma, mu+3*sigma)
    

    Output enter image description here