Search code examples
rdplyr

Count up and down from when condition is met, per category


I have a df:

df <- data.frame(nr = c(rep("n01", 10),
                    rep("n03", 13),
                    rep("n04", 8),
                    rep("n06", 14),
                    rep("n08", 13),
                    rep("n12", 14),
                    rep("n14", 10)),
             yr = c(2012:2021, 
                    2010:2022,
                    2013:2020,
                    2010:2023,
                    2011:2023,
                    2009:2022,
                    2011:2020),
             sp = c(rep(NA, 12),
                    "tr",
                    rep(NA, 27),
                    "tr",
                    rep(NA, 21),
                    "tr",
                    rep(NA, 19)))

I want to count back and on from when the condition sp == "tr" is met. This should happen for each nr separately. If for a specific nr there is no such condition, it should be NA. So in the end I want to end up with:

df <- data.frame(nr = c(rep("n01", 10),
                    rep("n03", 13),
                    rep("n04", 8),
                    rep("n06", 14),
                    rep("n08", 13),
                    rep("n12", 14),
                    rep("n14", 10)),
             yr = c(2012:2021, 
                    2010:2022,
                    2013:2020,
                    2010:2023,
                    2011:2023,
                    2009:2022,
                    2011:2020),
             sp = c(rep(NA, 12),
                    "tr",
                    rep(NA, 27),
                    "tr",
                    rep(NA, 21),
                    "tr",
                    rep(NA, 19)),
             count = c(rep(NA, 10),
                       -2:10,
                       rep(NA, 8),
                       -9:4,
                       rep(NA, 13),
                       -4:9,
                       rep(NA, 10)))

How do I end up with such a column? I guess I start with group_by(nr) |> mutate(count = ... and something with cumsum but from there I'm lost.


Solution

  • You can use row_number to get the current row number, which to get the row number when sp == "tr", and if() ... else ... to deal with groups with only NAs:

    df |>
      mutate(count = if(any(sp == "tr", na.rm = TRUE)) row_number() - which(sp == "tr") else NA, 
             .by = nr)
    

    If you have only one "tr" per group, or only want its first occurence, you can also use match, which deals more efficiently with NAs.

    df |>
      mutate(count = row_number() - match("tr", sp), .by = nr)
    

    output

    #head(df, 23)
    #     nr   yr   sp count
    # 1  n01 2012 <NA>    NA
    # 2  n01 2013 <NA>    NA
    # 3  n01 2014 <NA>    NA
    # 4  n01 2015 <NA>    NA
    # 5  n01 2016 <NA>    NA
    # 6  n01 2017 <NA>    NA
    # 7  n01 2018 <NA>    NA
    # 8  n01 2019 <NA>    NA
    # 9  n01 2020 <NA>    NA
    # 10 n01 2021 <NA>    NA
    # 11 n03 2010 <NA>    -2
    # 12 n03 2011 <NA>    -1
    # 13 n03 2012   tr     0
    # 14 n03 2013 <NA>     1
    # 15 n03 2014 <NA>     2
    # 16 n03 2015 <NA>     3
    # 17 n03 2016 <NA>     4
    # 18 n03 2017 <NA>     5
    # 19 n03 2018 <NA>     6
    # 20 n03 2019 <NA>     7
    # 21 n03 2020 <NA>     8
    # 22 n03 2021 <NA>     9
    # 23 n03 2022 <NA>    10