I have a given dataset with 1000 values, which is a combination of two normal distributions N(y1,1) and N(y2,1). The density looks like the following:
I want to calculate the portion of N(y1,1) and N(y2,1) in the dataset and the two means y1 and y2. This is my current approach:
z <- #Dataset as vector with 1000 entries#
lik <- function(mu1, mu2, part) -sum(part*dnorm(z, mu1, 1, log=TRUE) + (1-part)*dnorm(z, mu2, 1, log=TRUE))
mle <- mle(lik, start=list(mu1=-7, mu2=5, part=0.33))
But this gives me the following error message:
Error in solve.default(oout$hessian) :
Lapack routine dgesv: system is exactly singular: U[1,1] = 0
I redefined the likelihood to use log()
instead of argument log = TRUE
.
Oddly enough, the following works in spite of the warnings. Note that they are warnings, not errors.
library(stats4)
set.seed(7850) # Make the results reproducible
z <- sample(c(rnorm(333, -7, 1), rnorm(667, 5, 1)))
plot(density(z))
lik2 <- function(mu1, mu2, part) -sum(log(part*dnorm(z, mu1, 1) + (1-part)*dnorm(z, mu2, 1)))
mle2 <- mle(lik2, start = list(mu1 = -6, mu2 = 6, part = 1/2))
#Warning messages:
#1: In log(part * dnorm(z, mu1, 1) + (1 - part) * dnorm(z, mu2, 1)) :
# NaNs produced
#2: In log(part * dnorm(z, mu1, 1) + (1 - part) * dnorm(z, mu2, 1)) :
# NaNs produced
#3: In log(part * dnorm(z, mu1, 1) + (1 - part) * dnorm(z, mu2, 1)) :
# NaNs produced
#4: In log(part * dnorm(z, mu1, 1) + (1 - part) * dnorm(z, mu2, 1)) :
# NaNs produced
mle2
#
#Call:
#mle(minuslogl = lik2, start = list(mu1 = -6, mu2 = 6, part = 1/2))
#
#Coefficients:
# mu1 mu2 part
#-7.1091780 4.9377339 0.3330038