I am exploring some data, so the first thing I wanted to do was try to fit a normal (Gaussian) distribution to it. This is my first time trying this in R, so I'm taking it one step at a time. First I pre-binned my data:
myhist = data.frame(size = 10:27, counts = c(1L, 3L, 5L, 6L, 9L, 14L, 13L, 23L, 31L, 40L, 42L, 22L, 14L, 7L, 4L, 2L, 2L, 1L) )
qplot(x=size, y=counts, data=myhist)
Since I want counts, I need to add a normalization factor (N) to scale up the density:
fit = nls(counts ~ N * dnorm(size, m, s), data=myhist, start=c(m=20, s=5, N=sum(myhist$counts)) )
Then I create the fitted data for display and everything works great:
x = seq(10,30,0.2)
fitted = data.frame(size = x, counts=predict(fit, data.frame(size=x)) )
ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + geom_line(data=fitted)
I got excited when I found this thread which talks about using geom_smooth() to do it all in one step, but I can't get it to work:
Here's what I try... and what I get:
ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() + geom_smooth(method="nls", formula = counts ~ N * dnorm(size, m, s), se=F, start=list(m=20, s=5, N=300, size=10))
Error in method(formula, data = data, weights = weight, ...) :
parameters without starting value in 'data': counts
The error seems to indicate that it's trying to fit for the observed variable, counts, but that doesn't make any sense, and it predictably freaks out if I specify a "starting" value for counts too:
fitting parameters ‘m’, ‘s’, ‘N’, ‘size’, ‘counts’ without any variables
Error in eval(expr, envir, enclos) : object 'counts' not found
Any idea what I'm doing wrong? It's not the end of the world, of course, but fewer steps are always better, and you guys always come up with the most elegant solutions to these common tasks.
Thanks in advance!
Jeffrey
the first error indicates that ggplot2 cannot find the variable 'count', which is used in formula, in data.
Stats take place after mapping, that is, size -> x, and counts -> y.
Here is an example for using nls in geom_smooth:
ggplot(data=myhist, aes(x=size, y=counts)) + geom_point() +
geom_smooth(method="nls", formula = y ~ N * dnorm(x, m, s), se=F,
start=list(m=20, s=5, N=300))
The point is that using x and y, instead of size and counts, in the specification of formula.