I am working with physical activity data and follow-up pain data. I have a large dataset but for the shake of this example I have created a small one with the variables of my interest.
As my physical activity data are compositional in nature I am using compositional data analysis before using those variables as predictors in my mixed-effects model. My goal is to use the predict() function to predict some new data that I have created but I am receiving the folowing:
Error in rep(0, nobs) : invalid 'times' argument
I have googled it and I saw a post that was posted a few year ago but the answer did not work for mine.
Below is the dataset and my code:
library("tidyverse")
library("compositions")
library("robCompositions")
library("lme4")
dataset <- structure(list(work = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L,
3L, 3L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"),
department = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L,
3L, 4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"),
worker = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L,
4L, 4L, 4L), .Label = c("1", "2", "3", "4"), class = "factor"),
age = c(45, 43, 65, 45, 76, 34, 65, 23, 23, 45, 32, 76),
sex = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L,
2L, 2L), .Label = c("1", "2"), class = "factor"), pain = c(4,
5, 3, 2, 0, 7, 8, 10, 1, 4, 5, 4), lpa_w = c(45, 65, 43,
76, 98, 65, 34, 56, 2, 3, 12, 34), mvpa_w = c(12, 54, 76,
87, 45, 23, 65, 23, 54, 76, 23, 54), lpa_l = c(54, 65, 34,
665, 76, 87, 12, 34, 54, 12, 45, 12), mvpa_l = c(12, 43,
56, 87, 12, 54, 76, 87, 98, 34, 56, 23)), class = "data.frame", row.names = c(NA,
-12L))
#create compositions of physical activity
dataset$comp_w <- acomp(cbind(lpa_w = dataset[,7],
mvpa_w = dataset[,8]))
dataset$comp_l <- acomp(cbind(lpa_l = dataset[,9],
mvpa_l = dataset[,10]))
#Make a grid to use for predictions for composition of lpa_w and mvpa_w
mygrid=rbind(
expand.grid(lpa_w = seq(min(2), max(98),5),
mvpa_w = seq(min(12), max(87), 5)))
griddata <- acomp(mygrid)
#run the model
model <- lmer(pain ~ ilr(comp_w) + age + sex + ilr(comp_l) +
(1 | work / department / worker),
data = dataset)
(prediction = predict(model, newdata = list(comp_w = griddata,
age = rep(mean(dataset$age, na.rm=TRUE),nrow(griddata)),
sex = rep("1", nrow(griddata)),
comp_l = do.call("rbind", replicate(n=nrow(griddata), mean(acomp(dataset[,12])), simplify = FALSE)),
work = rep(dataset$work, nrow(griddata)),
department = rep(dataset$department, nrow(griddata)),
worker = rep(dataset$worker, nrow(griddata)))))
Any help would be greatly appreciated.
Thanks
Assigning the results of acomp
to an element of a data frame gives a weird data structure that messes things up downstream.
Constructing this data set (without messing up the original dataset
):
dataset_weird <- dataset
dataset_weird$comp_w <- acomp(cbind(lpa_w = dataset[,7],
mvpa_w = dataset[,8]))
dataset_weird$comp_l <- acomp(cbind(lpa_l = dataset[,9],
mvpa_l = dataset[,10]))
The result is so weird that str(dataset_weird)
, the usual way of investigating the structure of an R object, fails with
$ comp_w :Error in unclass(x)[i, , drop = drop] : (subscript) logical subscript too long
If we run sapply(dataset_weird, class)
we see that these elements have class acomp
. (They also appear to have an odd print()
method: when we print(dataset_weird$comp_w)
the results are a matrix of strings, but if we unclass(dataset_weird$comp_w)
we can see that the underlying object is numeric [!])
This whole problem is kind of tricky since you're working with n-column matrices that are getting converted to special acomp()
objects that are then getting converted to (n-1)-dimensional matrices (isometric log-ratio-transformed compositional data), the columns of which are then getting used as predictors. The basic point is that lme4
's machinery will get confused if you have elements in your data frame that are not simple one-dimensional vectors. So you have to do the work of creating data frame columns yourself.
Here's what I came up with, with one missing piece (described below):
## utility function: *either* uses a matrix argument (`comp_data`)
## *or* extracts relevant columns from a data frame (`data`):
## returns ilr-transformed values as a matrix, with appropriate column names
ilr_dat <- function(data, suffix = NULL, comp_data = NULL) {
if (!is.null(suffix) && is.null(comp_data)) {
comp_data <- as.matrix(data[grep(paste0(suffix,"$"), names(data))])
}
ilrmat <- ilr(acomp(comp_data))
colnames(ilrmat) <- paste0("ilr", suffix, ".", 1:ncol(ilrmat))
return(ilrmat)
}
## augment original data set (without weird compositional elements)
## using data.frame() rather than $<- or rbind() collapses matrix arguments
## to data frame rows in a way that R expects
dataset2 <- data.frame(dataset, ilr_dat(dataset, "_l"))
dataset2 <- data.frame(dataset2, ilr_dat(dataset, "_w"))
mygrid <- rbind(
expand.grid(lpa_w = seq(min(2), max(98),5),
mvpa_w = seq(min(12), max(87), 5)))
## generate ilr data for prediction
griddata <- as.data.frame(ilr_dat(comp_data=mygrid, suffix="_w"))
#run the model: ilr(comp_l) **not** included, see below
model <- lmer(pain ~ ilr_w.1 + age + sex + ## ilr(comp_l) +
(1 | work / department / worker),
data = dataset2)
## utility function for replication
xfun <- function(s) rep(dataset[[s]], nrow(griddata))
predict(model, newdata = data.frame(griddata,
age = mean(dataset$age, na.rm=TRUE),
sex = "1",
work = xfun("work"),
department = xfun("department"),
worker = xfun("worker")))
This seems to work.
The reason I did not include the _l
composition/irl in the model or the predictions is that I couldn't understand what this statement was doing:
comp_l = do.call("rbind", replicate(n=nrow(griddata), mean(acomp(dataset[,12])), simplify = FALSE))