Search code examples
rphylogeny

PGLS returns an error when referring to variables by their column position in a caper object


I am carrying out PGLS between a trait and 21 environmental variables for a clade of plant species. I am using a loop to do this 21 times, once for each of the environmental variables, and extract the p-values and some other values into a results matrix.

When normally carrying each PGLS individually I will refer to the variables by their column names, for example:

pgls(**trait1**~**meanrainfall**, data=caperobject)

But in order to loop this process for multiple environmental variables, I am referring to the variables by their column position in the data frame (which is in the form of the caper object for PGLS) instead of their column name:

pgls(**caperobject[,2]**~**caperobject[,5]**, data=caperobject)

This returns the error:

Error in model.frame.default(formula, data$data, na.action = na.pass) : invalid type (list) for variable 'caperobject[, 2]'

This is not a problem when running a linear regression using the original data frame -- referring to the variables by their column name only produces this error when using the caper object as the data using PGLS. Does this way of referring to the column names not work for caper objects? Is there another way I could refer to the column names so I can incorporate them into a PGLS loop?


Solution

  • Your solution is to use caperobject$data[,2] ~ caperobject$data[,5], because comparative.data class is a list with the trait values located in the list data. Here is an example:

    library(ape)
    library(caper)
    
    # generate random data
    seed <- 245937
    tr <- rtree(10)
    dat <- data.frame(taxa = tr$tip.label, 
                      trait1 = rTraitCont(tr, root.value = 3), 
                      meanrainfall = rnorm(10, 50, 10))
    
    # prepare a comparative.data structure 
    caperobject <- comparative.data(tr, dat, taxa, vcv = TRUE, vcv.dim = 3)
    
    # run PGLS
    pgls(trait1 ~ meanrainfall, data = caperobject)
    pgls(caperobject$data[, 1] ~ caperobject$data[, 2], data = caperobject)
    

    Both options return identical values for the intercept = 3.13 and slope = -0.003.

    A good practice in problems with data format is to check, how the data are stored with str(caperobject).