Search code examples
rapplyminsapplyinf

create new column (with outcome min or NA) from multiple selected columns


My data has many columns and subjects, but to illustrate it simpler, lets say I have 7 subjects with 3 variables/columns called x1, x2 and x3 (values range from 1 to 3 and NAs). In the analysis that I want it is important I actually call the columns I want to use (since I cannot just use the whole dataframe in my analysis because there are more variables/columns there)

>data <- data.frame(‘id’=c(1,2,3,4,5,6,7), ‘x1’=c(1,2,2,NA,3,3,1), ‘x2’=c(NA,3,1,NA,2,3,2), ‘x3’=c(NA,2,NA,NA,3,NA,1)
    id  x1  x2  x3
    1   1   NA  NA
    2   2   3   2
    3   2   1   NA
    4   NA  NA  NA
    5   3   2   NA
    6   3   3   NA
    7   1   2   1

The class of x1 x2 and x3 are numeric. Out of that, I want to create a variable/column called ‘x4’ that: - gives me the lowest number of row x1, x2 and x3.

-If there is an NA in a row of x1,x2,x3, the NA shall be ignored.

-If they are however ALL NAs, I would want the outcome to be NA. (NOT Inf, which is what it does with my code now)

-If there are two lowest numbers that are the same, just display any one of those two. So like this:

>data <- data.frame(‘id’=c(1,2,3,4,5,6,7), ‘x1’=c(1,2,2,NA,3,3,1), ‘x2’=c(NA,3,1,NA,2,3,2), ‘x3’=c(NA,2,NA,NA,3,NA,1), ‘x4’=c(1,2,1,NA,2,3,1)
    id  x1  x2  x3  x4
    1   1   NA  NA  1
    2   2   3   2   2
    3   2   1   NA  1
    4   NA  NA  NA  NA  
    5   3   2   NA  2
    6   3   3   NA  3
    7   1   2   1   1

I managed to find a very similar question, and I can mostly make it work: min for each row with dataframe in R

data$x4 <- apply(data[, c("x1","x2","x3")],1, FUN=min, na.rm = TRUE)

the problem I have now is that in case of all NAs (so id number 4), my outcome is not NA, but it is 'Inf'.

Question 1:How can I make it so it becomes an NA instead of Inf? I can of course do that afterwards like this:

is.na(data$x4) <- sapply(data$x4, is.infinite)

But I wonder if there is a nice way to do that already with/inside the previous code?

Also, rather then using sapply and the inside FUNction min, I would also like to try to make it work with code in a way like below: Question 2: is using this other code below possible?

data$x4 <- min(data[, c("x1","x2","x3")],1 , na.rm = TRUE)

for this x4 gets the outcome '1' everytime. I guess it just shows the lowest number (1) of the whole column? I dont understand why. I am already using ',1' but doesnt help.

I hope somebody can help me(r and stackoverflow newbie) out, thanks!


Solution

  • You are looking for pmin function which returns the (regular or parallel) minima of the input values. Below are two approaches using pmin:

    df$minIget <- do.call(pmin, c(df[,-1], na.rm = TRUE)) # Approch1: using do.call
    
    df %>% rowwise() %>% mutate(minIget = pmin(x1, x2,x3,na.rm = T))# Approch2: using tidyverse. 
    

    output:

     A tibble: 7 x 5
    # Rowwise: 
         id    x1    x2    x3 minIget
      <dbl> <dbl> <dbl> <dbl>   <dbl>
    1     1     1    NA    NA       1
    2     2     2     3     2       2
    3     3     2     1    NA       1
    4     4    NA    NA    NA      NA
    5     5     3     2     3       2
    6     6     3     3    NA       3
    7     7     1     2     1       1