I have encountered this weird segfault error and have zero clue how to solve it. I was running some Markov Chain Monte Carlo algorithm (a sequential algorithm that approximates a distribution). I parallelize each single iteration of this algorithm. So it is something like
for (iter in 1:T){
res[iter] = mclapply(fun)
}
Now the weird thing is when the size of my dataset is relatively moderate, the algorithm can run with no problem. Then I increase the dataset size (80,000 observations, not super large), the algorithm works for the first one thousand iterations and then stop with a segfault
error. I have pasted the error below:
*** caught segfault ***
address 0x20, cause 'memory not mapped'
Traceback:
1: mcfork()
2: FUN(X[[i]], ...)
3: lapply(seq_len(cores), inner.do)
4: mclapply(1:n, FUN = function(k) { return(OptimRE(dataSummaries[[k]], mu + beta, v, vre))}, mc.cores = ncores)
5: getMargLikelihood0(dataSummaries_layer1[[k]], mu, v, vre, beta[k], logarithm = TRUE)
6: FUN(X[[i]], ...)
7: lapply(X = S, FUN = FUN, ...)
8: doTryCatch(return(expr), name, parentenv, handler)
9: tryCatchOne(expr, names, parentenv, handlers[[1L]])
10: tryCatchList(expr, classes, parentenv, handlers)
11: tryCatch(expr, error = function(e) { call <- conditionCall(e) if (!is.null(call)) { if (identical(call[[1L]], quote(doTryCatch))) call <- sys.call(-4L) dcall <- deparse(call)[1L] prefix <- paste("Error in", dcall, ": ") LONG <- 75L msg <- conditionMessage(e) sm <- strsplit(msg, "\n")[[1L]] w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w") if (is.na(w)) w <- 14L + nchar(dcall, type = "b") + nchar(sm[1L], type = "b") if (w > LONG) prefix <- paste0(prefix, "\n ") } else prefix <- "Error : " msg <- paste0(prefix, conditionMessage(e), "\n") .Internal(seterrmessage(msg[1L])) if (!silent && identical(getOption("show.error.messages"), TRUE)) { cat(msg, file = stderr()) .Internal(printDeferredWarnings()) } invisible(structure(msg, class = "try-error", condition = e))})
12: try(lapply(X = S, FUN = FUN, ...), silent = TRUE)
13: sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))
14: FUN(X[[i]], ...)
15: lapply(seq_len(cores), inner.do)
16: mclapply(1:length(beta), FUN = function(k) { return(getMargLikelihood0(dataSummaries_layer1[[k]], mu, v, vre, beta[k], logarithm = TRUE))}, mc.cores = ncores)
17: getMargLikelihood(dataSummaries_layer1, newm, news, newv, beta1)
18: FitPoissRegNRE(my[j, ], groupid, id1, id2, nb = nb, nc = nc, sig = sig, a = a, b = b, a2 = a2[j], b2 = b2[j], ps_m = ps_m, ps_s = ps_s, njump = njump)
19: ApplyFitPoissRegNRE(y, hashABC, hashAB, hashA, nb = 200, nc = 800, sig = 1000, a = 2, b = 2, a2 = rep(100, 3), b2 = rep(5, 3), ps_m = 0.01, ps_s = 0.03, njump = 4)
20: eval(expr, envir, enclos)
21: eval(ei, envir)
22: withVisible(eval(ei, envir))
I have googled and some people did encounter this segfalut
issue in R and what they usually suggested is some version conflict and R should be reinstalled. But the weird thing in my case is that my algorithm works properly in the first thousand iterations. I also ran it without parallelization and it also works fine.
Could anyone suggest what is some possible causes for this? Now I completely have no direction.
Thanks!
Each call to the mclapply
function may be leaving around zombie processes. Since you're calling it repeatedly, you could be accumulating a huge number of them, eventually causing problems.
You can use the inline
package to create a function that waits on all child processes to get rid of zombie processes:
library(inline)
includes <- '#include <sys/wait.h>'
code <- 'int wstat; while (waitpid(-1, &wstat, WNOHANG) > 0) {};'
wait <- cfunction(body=code, includes=includes, convention='.C')
If you call wait
in the for
loop after mclapply
, it should get rid of any zombies and eliminate this as a possible problem:
for (iter in 1:T) {
res[iter] = mclapply(1:10, fun)
wait()
}