Search code examples
rfunctionmultiple-columns

How to apply a function to several columns listed in a vector in a function


Within a function, I am trying to create an additional column to a data frame, which corresponds to the minimum of several other columns that are listed in the entry of the function.

A minimal data set would be:

C1 <- c(1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0)
C2 <- c(0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0)
C3 <- c(0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 1)
C4 <- c(0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0)
Data <- data.frame(C1, C2, C3, C4)

If I want the minimum from C1, C2, and C4, outside a function, I would call:

Data$Min <- pmin(Data$C1, Data$C2, Data$C4)

Inside a function, however, I struggle and was only able to produce this:

min.col <- function(data, conditions){
                    data$Min <- pmin(data[[conditions]]) # [[ ]] is the wrong way to refer to the conditions, but I do not find how to

                    # After that, I go on here with my function based on the column data$Min but it is not relevant for the present problem.
}

To be called by:

min.col(data, conditions=c("C1", "C2", "C4"))

Anyone there to help? Many thanks in advance!


Solution

  • These use only base R.

    1) We can use do.call("pmin", ...) like this.

    f <- function(data, cols) transform(data, min = do.call("pmin", data[cols]))
    f(Data, c("C1", "C2", "C4"))
    

    giving:

       C1 C2 C3 C4 min
    1   1  0  0  0   0
    2   0  1  1  0   0
    3   1  1  0  0   0
    4   1  1  0  1   1
    5   0  0  0  0   0
    6   0  1  0  0   0
    7   1  0  1  0   0
    8   1  0  1  1   0
    9   0  0  1  0   0
    10  0  1  0  0   0
    11  0  1  1  0   0
    12  1  1  1  1   1
    13  1  0  0  0   0
    14  0  1  0  0   0
    15  0  0  1  0   0
    

    2) or use apply

    f2 <- function(data, cols) transform(data, min = apply(data[cols], 1, min))
    f2(Data, c("C1", "C2", "C4"))
    

    3) or Reduce

    f3 <- function(data, cols) transform(data, min = Reduce(pmin, data[cols]))
    f3(Data, c("C1", "C2", "C4"))
    

    4) If data[cols] only has 0 and 1 cells then if we compute the number of 0's in a row then the minimum should be 1 if that sum is 0 and the minimum is 0 otherwise. Note that 0 is regarded as FALSE and any other number is regarded as TRUE when coerced to logical so:

    f4 <- function(data, cols) transform(data, min = +!rowSums(!data[cols]))
    f4(Data, c("C1", "C2", "C4"))