Search code examples
rlapplysapply

Return list of rownames for each column that match condition


I'm trying to compare which variables load onto which factors for an Exploratory Factor Analysis in R. The significant variables have a value greater than or equal to 0.3. I have a few different datasets I need to do this for and I'm trying to write a function that takes the dataset as input and then returns a list for each factor.

As an example, consider the dataset below where the row names are the variables and the column names are the factors. Each dataset has at least 10 factors, so I'd prefer to have this function rather than manually add each factor to a list.

data <- as.data.frame(matrix(c(0.1, 0.2, 0.3, 0.01, 0.02, 0.4, 0.7, 0.2, 0.3, -0.4), nrow = 5, ncol=2))
rownames(data) <- c('A', 'B', 'C', 'D', 'E')
colnames(data) <- c("MR1", "MR2")

print(data)

   MR1  MR2
A 0.10  0.4
B 0.20  0.7
C 0.30  0.2
D 0.01  0.3
E 0.02 -0.4

I want to use the function

compare.loadings <- function(df) {
  groupings <- sapply(df, function(x) rownames(df)[abs(df[,x]) >= 0.3])
  return(groupings)
}

to return the expected list

print(compare.loadings(data))

$MR1
 [1] "C"

$MR2 
 [1] "A" "B" "D" "E"

However, the function currently returns an empty list

$MR1
character(0)

$MR2
character(0)

I've searched many different answers about applying functions to columns in R, trying lapply, and I don't see why my function returns empty characters. How should I modify it to get my expected output?


Solution

  • A substantially faster approach would be to use lapply, since sapply has some overhead that usually makes it a little slower:

    compfun <- function(data){
      lapply(data, \(x) rownames(data)[abs(x) >= 0.3])
    }
    
    # > compfun(data)
    # $MR1
    # [1] "C"
    # 
    # $MR2
    # [1] "A" "B" "D" "E"
    

    Its >2x faster with these data:

    microbenchmark::microbenchmark(
      compfun = compfun(data),
      compare.loadings = compare.loadings(data)
    )
    
    # Unit: microseconds
    #             expr    min      lq     mean  median      uq    max neval
    #          compfun 16.779 17.4845 18.89481 18.2235 18.9365 60.843   100
    # compare.loadings 41.232 42.2090 44.11851 43.4725 44.3080 83.297   100