Search code examples
rdplyrnaminimum

Min() ignoring zeros and NA with dplyr


I have a df that looks like this:

group year
1     2020 
1     NA
1     0
2     2021
2     2006
3     NA
3     0 
3     2010
3     2010
4     2006
4     2005
4     2010

And I want to group by group and then find the minimum year while ignoring NAs and 0 entries:

group year minYr
1     2020 2020
1     NA   2020
1     0    2020
2     2021 2006
2     2006 2006
3     NA   2010
3     0    2010
3     2010 2010
3     2010 2010
4     2006 2005
4     2005 2005
4     2010 2005

My initial approach df <- df %>% group_by(group) %>% mutate (minYr = min(year, na.rm = TRUE)) caused a runtime error and didn't take care of the zeros.

Does anyone have a better way of doing this?


Solution

  • df1 %>%
      group_by(group) %>% 
      mutate(minYr = min(year[year > 0], na.rm = TRUE)) %>%
      # mutate(minYr = min(year[year > 0 & !is.na(year)])) %>% # equivalent
      ungroup()
    
    # A tibble: 12 × 3
       group  year minYr
       <dbl> <dbl> <dbl>
     1     1  2020  2020
     2     1    NA  2020
     3     1     0  2020
     4     2  2021  2006
     5     2  2006  2006
     6     3    NA  2010
     7     3     0  2010
     8     3  2010  2010
     9     3  2010  2010
    10     4  2006  2005
    11     4  2005  2005
    12     4  2010  2005
    

    df1 <- structure(list(group = c(1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4), 
                          year = c(2020, NA, 0, 2021, 2006, NA, 0, 2010, 2010, 2006, 2005, 2010)), 
                     class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -12L))