I'm trying to compare which variables load onto which factors for an Exploratory Factor Analysis in R. The significant variables have a value greater than or equal to 0.3. I have a few different datasets I need to do this for and I'm trying to write a function that takes the dataset as input and then returns a list for each factor.
As an example, consider the dataset below where the row names are the variables and the column names are the factors. Each dataset has at least 10 factors, so I'd prefer to have this function rather than manually add each factor to a list.
data <- as.data.frame(matrix(c(0.1, 0.2, 0.3, 0.01, 0.02, 0.4, 0.7, 0.2, 0.3, -0.4), nrow = 5, ncol=2))
rownames(data) <- c('A', 'B', 'C', 'D', 'E')
colnames(data) <- c("MR1", "MR2")
print(data)
MR1 MR2
A 0.10 0.4
B 0.20 0.7
C 0.30 0.2
D 0.01 0.3
E 0.02 -0.4
I want to use the function
compare.loadings <- function(df) {
groupings <- sapply(df, function(x) rownames(df)[abs(df[,x]) >= 0.3])
return(groupings)
}
to return the expected list
print(compare.loadings(data))
$MR1
[1] "C"
$MR2
[1] "A" "B" "D" "E"
However, the function currently returns an empty list
$MR1
character(0)
$MR2
character(0)
I've searched many different answers about applying functions to columns in R, trying lapply
, and I don't see why my function returns empty characters. How should I modify it to get my expected output?
A substantially faster approach would be to use lapply
, since sapply
has some overhead that usually makes it a little slower:
compfun <- function(data){
lapply(data, \(x) rownames(data)[abs(x) >= 0.3])
}
# > compfun(data)
# $MR1
# [1] "C"
#
# $MR2
# [1] "A" "B" "D" "E"
Its >2x faster with these data:
microbenchmark::microbenchmark(
compfun = compfun(data),
compare.loadings = compare.loadings(data)
)
# Unit: microseconds
# expr min lq mean median uq max neval
# compfun 16.779 17.4845 18.89481 18.2235 18.9365 60.843 100
# compare.loadings 41.232 42.2090 44.11851 43.4725 44.3080 83.297 100