Search code examples
rlapplysapply

Applying lm() using sapply or lapply


So I'm trying to use lm() with sapply.

#example data and labels
data <- matrix(data = runif(1000), nrow = 100, ncol = 10))
markers <- sample(0:1, replace = T, size = 100)

# try to get linear model stuff
Lin <- sapply(data, function(x) lm(unlist(markers) ~ unlist(x))$coefficients)

MY problem is that this gives me coefficients for 1000 equations rather than 10


Solution

  • You need to supply sapply with a data frame, not a matrix.

    #example data and labels
    data <- data.frame(matrix(data = runif(1000), nrow = 100, ncol = 10))
    markers <- sample(0:1, replace = T, size = 100)
    
    # try to get linear model stuff
    sapply(data, function(x) coef(lm(markers ~ x)))    
    sapply(data, function(x) coef(lm(markers ~ x))[-1]) # Omit intercepts
            X1.x         X2.x         X3.x         X4.x         X5.x 
     0.017043626  0.518378546 -0.011110972 -0.145848478  0.335232991 
            X6.x         X7.x         X8.x         X9.x        X10.x 
     0.015122184  0.001985933  0.191279594 -0.077689961 -0.107411203
    

    Your original matrix fails:

    data <- matrix(data = runif(1000), nrow = 100, ncol = 10)
    sapply(data, function(x) coef(lm(markers ~ x)))    
    # Error: variable lengths differ (found for 'x')
    

    Because sapply, which calls lapply, will convert its first argument, X, to a list using as.list before performing the function. But as.list applied to a matrix results in list with length equal to the number of entries in the matrix, in your case 1,000. as.list when applied to a data frame results in a list with length equal to the number of columns of the data frame, in your case 10, with the elements containing the values in each column.


    > lapply
    function (X, FUN, ...) 
    {
        FUN <- match.fun(FUN)
        if (!is.vector(X) || is.object(X)) 
            X <- as.list(X)
        .Internal(lapply(X, FUN))
    }
    <bytecode: 0x000002397f5ce508>
    <environment: namespace:base>