I am running the same piece of code, with the same seed, same package versions, same R version, on 3 different systems: 1) my computer 2) a linux cluster and 3) R snippets
packageVersion("truncnorm")
packageVersion("MASS")
set.seed(42)
fit<-NULL
x <- c(0.0916, 0.0084, 0.0442, 0.6254, 0.2021, 0.0135, 0.0259,
0.1557,0.0191, 0.3575, 0.1843, 0.1792, 0.0476, 0.0765,
0.0356, 0.0039, 0.1714, 0.1222, 0.2872, 0.395, 0.3334,
0.2223, 0.0096, 0.0436, 0.207)
mu0 <- mean(x)
sigma0 <- stats::sd(x)
fit <- MASS::fitdistr(x, densfun = function(xx, mu, sigma) {
truncnorm::dtruncnorm(xx, a = 0, b = 1, mean = mu, sd = sigma)
},
start = list(mu = mu0, sigma = sigma0),
lower = list(mu = -Inf, sigma = 0.05),
upper = list(mu = Inf, sigma = Inf))
print(fit)
On my computer, fit
is showing as NULL
, whereas in the other 2 systems, the model does fit successfully. Any ideas how that is possible?
P.S.: The issue on my system is
Error in MASS::fitdistr(x, densfun = function(xx, mu, sigma) { : optimization failed
If I change the data a little bit, for example remove 0.0084
from the data (this is the second number in the data), the model fits. Giving me the same outputs across all 3 systems.
Here is the sessionInfo()
from my own system:
R version 3.6.0 (2019-04-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C
LC_TIME=English_United States.1252attached base packages: [1] stats graphics grDevices utils
datasets methods baseother attached packages: [1] truncnorm_1.0-8 MASS_7.3-51.4
loaded via a namespace (and not attached): [1] ks_1.13.2
compiler_3.6.0 Matrix_1.2-17 mclust_5.4.7 tools_3.6.0
simIReff_1.0 [7] mvtnorm_1.1-3 KernSmooth_2.23-15 grid_3.6.0 pracma_2.3.3 lattice_0.20-38
This turns out to be a numerically unstable/sensitive problem.
If you turn on debug(MASS::fitdistr)
and step through, eventually you'll get to a line
if (res$convergence > 0L) stop("optimization failed")
If you print out the value of res
at this point you get (slightly abbreviated):
$par
mu sigma
-6.411168 1.022651
$value
[1] -21.72969
$counts
function gradient
81 81
$convergence
[1] 52
$message
[1] "ERROR: ABNORMAL_TERMINATION_IN_LNSRCH"
in other words, the L-BFGS-B optimizer (which is used because you specified bounds - it's quite finicky) thinks there is a problem, and fitdistr
accordingly throws an error. As far as I can tell there is no way to tell fitdistr
"just give me the answer anyway".
I tried a bunch of different methods (perturbing the starting conditions a little bit, i.e. mu + 1e-3, sigma0 + 1e-3
; removing the bounds so that fitdistr
uses the more robust Nelder-Mead optimizer instead). Plotting the log10(1e-4 + neg log likelihood)
(so that we can see small differences from the minimum negative log-likelihood) gives the following image (code below):
[red: Linux, green: Windows/convergence failure, blue: Windows/perturbed start, cyan: Nelder-Mead]
These are the corresponding fits to the distribution:
As you can see (or probably can't!), all the fits are basically identical. If you calculate the negative log-likelihood, you'll see that they differ by less than 0.001 units [i.e., negligibly]. (You can also tell this because all the points in the first image lie within the log10(difference) = -3
contour.)
So the differences between answers don't really matter, just the annoyance of getting an error. You could (1) use a while
loop + try()
to perturb the starting value a little bit until you get an answer; (2) drop the bounds to allow Nelder-Mead to work: (3) use bbmle
or some other tool that lets you be a little bit more robust/defensive about the optimization procedure ...
nllfun <- function(mu, sigma) {
-sum(log(dtruncnorm(x, a = 0, b = 1, mean = mu, sd = sigma)))
}
library(emdbook)
library(truncnorm)
p1 <- c(-7.02938981, 1.06779942) ## Linux
p2 <- c(-6.411, 1.022651) ## Windows (convergence error)
p3 <- c(-6.587645, 1.0359466) ## Windows (perturbed start)
p4 <- c(-5.9937989, 0.9901366) ## Windows (Nelder-Mead/no bounds)
cc <- curve3d(nllfun(x,y), xlim = c(-7.1, -5.98), ylim = c(0.98, 1.07),
n = c(101, 101), sys3d = "none")
image(cc$x, cc$y, log10(cc$z-min(cc$z) + 1e-4))
contour(cc$x, cc$y, log10(cc$z-min(cc$z) + 1e-4), add = TRUE)
points(p1[1], p1[2], pch = 16, col = 2)
points(p2[1], p2[2], pch = 17, col = 3)
points(p3[1], p3[2], pch = 18, col = 4)
points(p4[1], p4[2], pch = 18, col = 5)
hist(x, freq=FALSE)
curve(dtruncnorm(x, a=0, b=1, mean=p1[1], sd = p1[2]), col = 2, add=TRUE)
curve(dtruncnorm(x, a=0, b=1, mean=p2[1], sd = p2[2]), col = 3, add=TRUE)
curve(dtruncnorm(x, a=0, b=1, mean=p3[1], sd = p3[2]), col = 4, add=TRUE)
curve(dtruncnorm(x, a=0, b=1, mean=p4[1], sd = p4[2]), col = 5, add=TRUE)