Search code examples
rfunctionloopsmultiple-columns

Apply calculation to specified number of columns and store the result in separate objects, to be combined at the end


I have a data frame in which I want to apply a calculation to a varying amount of columns that are specified, and store the results in separate objects, to be combined at the end.

A minimal example would look like:

Name <- c("Case 1", "Case 2", "Case 3", "Case 4", "Case 5")
Base <- c(0, 0, 0, 1, 1)
C1 <- c(1, 0, 1, 1, 0)
C2 <- c(0, 1, 1, 1, 0)
C3 <- c(0, 1, 0, 0, 0)
C4 <- c(1, 1, 0, 1, 0)
Data <- data.frame(Name, Base, C1, C2, C3, C4)

score.calc <- function(data, col.names){

                       # This is how I would to it outside a function and without loop:
                       Score1 <- sum(pmin(Data$C1, pmin(Data$Base)))/sum(pmin(Data$Base))
                       Score2 <- sum(pmin(Data$C2, pmin(Data$Base)))/sum(pmin(Data$Base))
                       Score3 <- sum(pmin(Data$C3, pmin(Data$Base)))/sum(pmin(Data$Base))
                       Scores <- c(Score1, Score2, Score3)
}

new.score <- score.calc(Data,
                        col.names= c("C1", "C2", "C3"))

And should return:

> new.score
[1] 0.5 0.5 0.0

Anyone with an idea? Many thanks!


Solution

  • Try this:

    score.calc <- function(data, col.names, base = "Base") {
      sapply(subset(data, select = col.names),
             function(z) sum(pmin(z, Data[[base]]))/sum(Data[[base]]))
    }
    score.calc(Data, c("C1", "C2", "C3"))
    #  C1  C2  C3 
    # 0.5 0.5 0.0 
    

    The changes I made:

    1. Changed from the hard-coded $C1 (etc) to a more dynamic data[[nm]] mindset;
    2. Changed the hard-coded $Base to one based on the arguments, and with a default value so that while you shouldn't need to change it, if you want to apply it to different data (with a different "Base" column), you still can;
    3. Did it dynamically with sapply, which will return (here) a vector the same length as col.names ... assuming that all columns provided are present in the data and numeric-like;
    4. Use subset(., select=) instead of [, since the latter can drop to a vector instead of a single-column frame in some circumstances (i.e., base::[.data.frame and a simple data.frame, not a tbl_df).
    5. Removed two extraneous pmin. Its use with a single argument is a no-op: its functionality is providing the minimum element-wise between two or more vectors; to apply it to a single vector does nothing.