I'm trying to calculate the correlation between two variables for multiple different groups (e.g. DT[, cor.test(var1, var2), group]
). This works great whenever I use cor.test(var1, var2, method = 'pearson')
but throws an error when I use cor.test(var1, var2, method = 'spearman')
.
library(data.table)
DT <- as.data.table(iris)
# works perfectly
DT[,cor.test(Sepal.Length,Sepal.Width, method = 'pearson'), Species]
# Species statistic parameter p.value estimate null.value
# 1: setosa 7.680738 48 6.709843e-10 0.7425467 0
# 2: setosa 7.680738 48 6.709843e-10 0.7425467 0
# 3: versicolor 4.283887 48 8.771860e-05 0.5259107 0
# 4: versicolor 4.283887 48 8.771860e-05 0.5259107 0
# 5: virginica 3.561892 48 8.434625e-04 0.4572278 0
# 6: virginica 3.561892 48 8.434625e-04 0.4572278 0
# alternative method
# 1: two.sided Pearson's product-moment correlation
# 2: two.sided Pearson's product-moment correlation
# 3: two.sided Pearson's product-moment correlation
# 4: two.sided Pearson's product-moment correlation
# 5: two.sided Pearson's product-moment correlation
# 6: two.sided Pearson's product-moment correlation
# data.name conf.int
# 1: Sepal.Length and Sepal.Width 0.5851391
# 2: Sepal.Length and Sepal.Width 0.8460314
# 3: Sepal.Length and Sepal.Width 0.2900175
# 4: Sepal.Length and Sepal.Width 0.7015599
# 5: Sepal.Length and Sepal.Width 0.2049657
#> 6: Sepal.Length and Sepal.Width 0.6525292
# error
DT[,cor.test(Sepal.Length,Sepal.Width, method = 'spearman'), Species]
# Error in `[.data.table`(DT, , cor.test(Sepal.Length, Sepal.Width, method = "spearman"), :
# Column 2 of j's result for the first group is NULL. We rely on the column types of the first
# result to decide the type expected for the remaining groups (and require consistency). NULL
# columns are acceptable for later groups (and those are replaced with NA of appropriate type
# and recycled) but not for the first. Please use a typed empty vector instead, such as
# integer() or numeric().
I know there are work arounds for this specific example, but it is possible to tell data.table
before hand what the column types are going to be for any case using DT[i,j,by = 'something']
?
In case you want to keep all columns, rather than remove the ones with a NULL, You can set the class of the 'problem' column manually (in this case the column giving issues is "parameter") . This would be preferable to removing the NULLs, if the column does contain values for some groups but not others.
DT[, {
res <- cor.test(Sepal.Length, Sepal.Width, method = 'spearman')
class(res$parameter) <- 'integer'
res
}, Species]
# Species statistic parameter p.value estimate null.value alternative method data.name
#1: setosa 5095.097 NA 2.316710e-10 0.7553375 0 two.sided Spearman's rank correlation rho Sepal.Length and Sepal.Width
#2: versicolor 10045.855 NA 1.183863e-04 0.5176060 0 two.sided Spearman's rank correlation rho Sepal.Length and Sepal.Width
#3: virginica 11942.793 NA 2.010675e-03 0.4265165 0 two.sided Spearman's rank correlation rho Sepal.Length and Sepal.Width