Search code examples
rdplyrmindata-wrangling

Having trouble with which.min inside dplyr pipe


I have some trouble with which.min function inside a dplyr pipe I have a cumbersome solution (*) and I'm looking form more compact and elegant way to do this

  1. reproducible example
library(dplyr)

data=data.frame(s1=c(10,NA,5,NA,NA),s2=c(8,NA,NA,4,20),s3=c(NA,NA,2,NA,10))
data
#>   s1 s2 s3
#> 1 10  8 NA
#> 2 NA NA NA
#> 3  5 NA  2
#> 4 NA  4 NA
#> 5 NA 20 10
  1. Min vaule:

here with min(x,na.rm=TRUE) I could extract the min value

data%>%
  rowwise()%>%
  mutate(Min_s=min(c(s1,s2,s3),na.rm=TRUE))
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `Min_s = min(c(s1, s2, s3), na.rm = TRUE)`.
#> ℹ In row 2.
#> Caused by warning in `min()`:
#> ! no non-missing arguments to min; returning Inf
#> # A tibble: 5 × 4
#> # Rowwise: 
#>      s1    s2    s3 Min_s
#>   <dbl> <dbl> <dbl> <dbl>
#> 1    10     8    NA     8
#> 2    NA    NA    NA   Inf
#> 3     5    NA     2     2
#> 4    NA     4    NA     4
#> 5    NA    20    10    10
  1. extracting variable containing min val:

Here I'm having trouble extracting which variable contain the min value

data%>%
  rowwise()%>%
  mutate(which_s=which.min(c(s1,s2,s3)))
#> Error in `mutate()`:
#> ℹ In argument: `which_s = which.min(c(s1, s2, s3))`.
#> ℹ In row 2.
#> Caused by error:
#> ! `which_s` must be size 1, not 0.
#> ℹ Did you mean: `which_s = list(which.min(c(s1, s2, s3)))` ?

# Solution (*)
data%>%
  rowwise()%>%
  mutate(which_s=if(!is.na(s1)|!is.na(s2)|!is.na(s3)) {which.min(c(s1,s2,s3))} else NA )
#> # A tibble: 5 × 4
#> # Rowwise: 
#>      s1    s2    s3 which_s
#>   <dbl> <dbl> <dbl>   <int>
#> 1    10     8    NA       2
#> 2    NA    NA    NA      NA
#> 3     5    NA     2       3
#> 4    NA     4    NA       2
#> 5    NA    20    10       3

Created on 2024-11-07 with reprex v2.1.0


Solution

  • In your second row, you will obtain integer(0) in the column which_s, and that's the point you cannot run it without errors.

    Instead, you could first store the results in a list, and then unnest (don't forget to enable keep_empty argument in unnest)

    data %>%
        rowwise() %>%
        mutate(which_s = list(which.min(c(s1, s2, s3)))) %>%
        unnest(which_s, keep_empty = TRUE)
    

    which gives

    # A tibble: 5 × 4
         s1    s2    s3 which_s
      <dbl> <dbl> <dbl>   <int>
    1    10     8    NA       2
    2    NA    NA    NA      NA
    3     5    NA     2       3
    4    NA     4    NA       2
    5    NA    20    10       3