I have a GAMLSS model I'm trying to fit to multiple subsets of my data. Each month needs to be analyzed separately, so I'm using a foreach loop to iterate through the months. However, when I parallelize my loop, the results of dropterm
all get NA'd. Here's a similar example using built-in data:
library(dplyr)
library(gamlss)
library(MASS)
nCores <- detectCores()
gamlssCl <- makeCluster(nCores)
registerDoParallel(gamlssCl)
test.par <- foreach(s = unique(iris$Species),
.packages = c('dplyr', 'gamlss', 'MASS')) %dopar% {
species.data <- filter(iris, Species == s)
model <- gamlss(Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Length,
data = species.data,
family = GA)
var.rank <- dropterm(model, test = 'Chisq') %>%
mutate(Variable = row.names(.)) %>%
arrange(AIC) %>%
filter(Variable != '<none>')
var.rank
}
stopCluster(gamlssCl)
test.par
# [[1]]
# Df AIC LRT Pr(Chi) Variable
# 1 NA NA NA NA Sepal.Length
# 2 NA NA NA NA Sepal.Width
# 3 NA NA NA NA Petal.Length
#
# [[2]]
# Df AIC LRT Pr(Chi) Variable
# 1 NA NA NA NA Sepal.Length
# 2 NA NA NA NA Sepal.Width
# 3 NA NA NA NA Petal.Length
#
# [[3]]
# Df AIC LRT Pr(Chi) Variable
# 1 NA NA NA NA Sepal.Length
# 2 NA NA NA NA Sepal.Width
# 3 NA NA NA NA Petal.Length
test.serial <- foreach(s = unique(iris$Species)) %do% {
species.data <- filter(iris, Species == s)
model <- gamlss(Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Length,
data = species.data,
family = GA)
var.rank <- dropterm(model, test = 'Chisq') %>%
mutate(Variable = row.names(.)) %>%
arrange(AIC) %>%
filter(Variable != '<none>')
var.rank
}
test.serial
# [[1]]
# Df AIC LRT Pr(Chi) Variable
# 1 1 -31.66335 0.06406465 0.8001832 Sepal.Width
# 2 0 -29.72741 0.00000000 NA Petal.Length
# 3 1 -29.43731 2.29010516 0.1302011 Sepal.Length
#
# [[2]]
# Df AIC LRT Pr(Chi) Variable
# 1 0 31.03608 0.000000 NA Petal.Length
# 2 1 33.81852 4.782442 2.875132e-02 Sepal.Width
# 3 1 56.00459 26.968510 2.067972e-07 Sepal.Length
#
# [[3]]
# Df AIC LRT Pr(Chi) Variable
# 1 1 16.29265 0.08628226 7.689578e-01 Sepal.Width
# 2 0 18.20637 0.00000000 NA Petal.Length
# 3 1 77.14978 60.94341742 5.873901e-15 Sepal.Length
Note: The error doesn't manifest when using glm
instead of gamlss
Sorry, no solution yet, but here's a minimal example that illustrate the issue and that doesn't depend on foreach.
First, do:
library("gamlss")
data <- subset(iris, Species == "setosa")
model <- gamlss(Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Length,
data = data, family = GA)
## GAMLSS-RS iteration 1: Global Deviance = -37.7274
## GAMLSS-RS iteration 2: Global Deviance = -37.7274
model2 <- dropterm(model, test = "Chisq")
print(model2)
## Single term deletions for
## mu
##
## Model:
## Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Length
## Df AIC LRT Pr(Chi)
## <none> -29.727
## Sepal.Length 1 -29.437 2.29011 0.1302
## Sepal.Width 1 -31.663 0.06406 0.8002
## Petal.Length 0 -29.727 0.00000
and then save the results to file:
saveRDS(list(model = model, model2 = model2), file = "gamlss.rds")
Then in a fresh R session (R --vanilla
), do:
> library("gamlss")
Loading required package: splines
Loading required package: gamlss.data
Loading required package: gamlss.dist
Loading required package: MASS
Loading required package: nlme
Loading required package: parallel
********** GAMLSS Version 5.0-1 **********
For more on GAMLSS look at http://www.gamlss.org/
Type gamlssNews() to see new features/changes/bug fixes.
> gamlss <- readRDS("gamlss.rds")
> model <- gamlss$model
> class(model)
[1] "gamlss" "gam" "glm" "lm"
> model2 <- dropterm(model, test = "Chisq")
Model with term Sepal.Length has failed
Model with term Sepal.Width has failed
Model with term Petal.Length has failed
> print(model2)
Single term deletions for
mu
Model:
Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Length
Df AIC LRT Pr(Chi)
<none> -29.727
Sepal.Length
Sepal.Width
Petal.Length
Compare the output of model2
in the fresh R session compared to that of the first session above;
> all.equal(model2, gamlss$model2)
[1] "Component “Df”: 'is.NA' value mismatch: 1 in current 4 in target"
[2] "Component “AIC”: 'is.NA' value mismatch: 0 in current 3 in target"
[3] "Component “LRT”: 'is.NA' value mismatch: 1 in current 4 in target"
[4] "Component “Pr(Chi)”: 'is.NA' value mismatch: 2 in current 4 in target"
Something is clearly not correct here.
I suspect that the model
object contains one or more so-called promises that are not preserved correctly when transferred to another R process (as is the case when you use SNOW clusters).
I would argue this is an issue with the gamlss package itself. The problem appears to be that gamlss object cannot be serialized. I suggest that you report this to the package maintainer. Feel free to use my minimal example in your report.