Search code examples
rstatisticsnormal-distribution

Why does R plot the wrong distribution in this case?


I'm relatively new to R and I've been trying to simulate a normal distribution with R's builtin functions rnorm and dnorm and subsequently plotting it.

Why is it the case that it plots this wrong density function when my code is this

x <- rnorm(1000, mean=5, sd=2)
hist(x, border='red',freq=F)
y <- curve(dnorm(x,mean(x), sd(x)), add=T)

enter image description here

but when my code is like this it does plot the correct density function

x <- rnorm(1000, mean=5, sd=2)
hist(x, border='red',freq=F)
meanx <- mean(x)
sdx <- sd(x)
y <- curve(dnorm(x,meanx,sdx), add=T)

enter image description here


Solution

  • curve() does not take a vector of values as the first expression. It takes an expression. When you write dnorm(x) while x is an object in your global environment, you are creating a vector of values, not an expression.

    This is confusing and you got unlucky because x happens to be the name of the first argument of dnorm(), which is why your code runs without error, but doesn't produce the expected output.

    Sorting this out becomes more clear if you re-name your object x to be xx.

    Then your original code throws an error because dnorm(xx) is not an expression:

    set.seed(1234)
    xx <- rnorm(1000, mean=5, sd=2)
    hist(xx, border='red',freq=F)
    curve(dnorm(xx ,mean(xx), sd(xx)), add=T)
    
    Error in curve(dnorm(xx, mean(xx), sd(xx)), add = T) : 
      'expr' must be a function, or a call or an expression containing 'x'
    

    But using the dnorm() argument name x along with your data xx to create an expression (dnorm(x, mean(xx), sd(xx))) works as expected:

    set.seed(1234)
    xx <- rnorm(1000, mean=5, sd=2)
    hist(xx, border='red',freq=F)
    curve(dnorm(x, mean(xx), sd(xx)), add=T)