Search code examples
rapplymin

Compute minima across list elements with NA


I got a klugey solution but feel silly writing so much code for what seems simple. This goes pretty fast with lists of a few dozen MB, so I don't need to improve efficiency. But I'd still like help.

I have a large list (n elements, each one is a vector of length m). I need to get the m minimum values across all n elements (what I mean is obvious in code if this is confusing). There are NAs, in some cases with 0 complete cases and in most cases with >=1 complete case. I wrote some code that works fine but it feels like there should be a much simpler way to get here. Can you streamline this code?

Specifically, is there a way to avoid the conditional for the minimum function, and is there an apply-family function that would let me avoid the first cbind?

# make data
rawval<-replicate(10, sample(c(1:10, NA), size = 10, replace =T)
     , simplify = F)

# this seems clunky, does this function have a name?
mymin<-function(x)ifelse(all(is.na(x)), NA, min(x, na.rm =T))

# I don't see why I should need two apply family functions here
tomin<-sapply(rawval, cbind) %>%  apply(MARGIN = 1, FUN = mymin)

Apologies, I suspect this is a duplicate question :(


Solution

  • What you want is mapply. It applies a function to every element of multiple lists. See its help page.

    I'll suggest you a function. I'm not really sure about the sum part, but if I got it right, you only want to find the min of the rows which have a positive sum.

    I benchmarked my_function against your_function and got the following results:

    UPDATE: I also included in the benchmark a my_updated_function, where I simply use pmin.int. I now understand your point, if all values are NA, keep NA as the "min". I previously thought there would be negative values.

    I included the pmin solution you suggested (using ifelse) and the @jay.sf solution.

    rawval <- replicate(
        1000,
        sample(c(1:10, NA), size = 1000, replace =T),
        simplify = F
    )
    
    my_function <- function(values) {
        sums <- mapply(sum, values, na.rm=TRUE)
        mins <- mapply(min, values, na.rm=TRUE)
        mins[sums <= 0] <- NA
        return(mins)
    }
    
    my_updated_function <- function(values) {
        mins <- do.call(pmin.int, c(values, na.rm=TRUE))
        # if min is zero, all values are NA. NOTE: this only works like this
        # because I'm assuming numbers will always be positive integers
        # like the example you provided.
        mins[mins == 0] <- NA
        return(mins)
    }
    
    your_function <- function(values) {
        mymin<-function(x)ifelse(sum(x, na.rm=T)>0, min(x, na.rm =T), NA)
        
        # I don't see why I should need two apply family functions here
        tomin<- apply(sapply(values, cbind), MARGIN = 1, FUN = mymin)
        return(tomin)
    }
    
    pmin_function <- function(values) {
        sums <- mapply(sum, values, na.rm=TRUE)
        mins <- do.call(pmin, c(values, na.rm = TRUE))
        mins[sums <= 0] <- NA
        return(mins)
    }
    
    jay_sf_function <- function(values) {
        return(sapply(values, \(x) ifelse(!all(is.na(x)), min(x, na.rm=TRUE), NA)))
    }
    
    microbenchmark::microbenchmark(
        your_function(rawval),
        my_function(rawval),
        my_updated_function(rawval),
        pmin_function(rawval),
        jay_sf_function(rawval)
    )
    
    Unit: milliseconds
                            expr     min       lq      mean   median       uq     max neval
           your_function(rawval) 29.0871 32.77735 34.676408 34.37340 35.91040 77.6884   100
             my_function(rawval)  4.8762  5.16365  5.376355  5.37335  5.52475  7.3706   100
     my_updated_function(rawval)  2.6481  2.72655  2.872085  2.78275  2.92460  4.0724   100
           pmin_function(rawval)  5.7140  5.95945  6.268012  6.13110  6.35375  9.4198   100
         jay_sf_function(rawval)  4.8583  5.13700  6.839790  5.43480  6.45270 47.6075   100