I have a df:
df <- data.frame(nr = c(rep("n01", 10),
rep("n03", 13),
rep("n04", 8),
rep("n06", 14),
rep("n08", 13),
rep("n12", 14),
rep("n14", 10)),
yr = c(2012:2021,
2010:2022,
2013:2020,
2010:2023,
2011:2023,
2009:2022,
2011:2020),
sp = c(rep(NA, 12),
"tr",
rep(NA, 27),
"tr",
rep(NA, 21),
"tr",
rep(NA, 19)))
I want to count back and on from when the condition sp == "tr"
is met. This should happen for each nr
separately. If for a specific nr
there is no such condition, it should be NA.
So in the end I want to end up with:
df <- data.frame(nr = c(rep("n01", 10),
rep("n03", 13),
rep("n04", 8),
rep("n06", 14),
rep("n08", 13),
rep("n12", 14),
rep("n14", 10)),
yr = c(2012:2021,
2010:2022,
2013:2020,
2010:2023,
2011:2023,
2009:2022,
2011:2020),
sp = c(rep(NA, 12),
"tr",
rep(NA, 27),
"tr",
rep(NA, 21),
"tr",
rep(NA, 19)),
count = c(rep(NA, 10),
-2:10,
rep(NA, 8),
-9:4,
rep(NA, 13),
-4:9,
rep(NA, 10)))
How do I end up with such a column?
I guess I start with group_by(nr) |> mutate(count = ...
and something with cumsum
but from there I'm lost.
You can use row_number
to get the current row number, which
to get the row number when sp == "tr"
, and if() ... else ...
to deal with groups with only NAs:
df |>
mutate(count = if(any(sp == "tr", na.rm = TRUE)) row_number() - which(sp == "tr") else NA,
.by = nr)
If you have only one "tr" per group, or only want its first occurence, you can also use match
, which deals more efficiently with NA
s.
df |>
mutate(count = row_number() - match("tr", sp), .by = nr)
output
#head(df, 23)
# nr yr sp count
# 1 n01 2012 <NA> NA
# 2 n01 2013 <NA> NA
# 3 n01 2014 <NA> NA
# 4 n01 2015 <NA> NA
# 5 n01 2016 <NA> NA
# 6 n01 2017 <NA> NA
# 7 n01 2018 <NA> NA
# 8 n01 2019 <NA> NA
# 9 n01 2020 <NA> NA
# 10 n01 2021 <NA> NA
# 11 n03 2010 <NA> -2
# 12 n03 2011 <NA> -1
# 13 n03 2012 tr 0
# 14 n03 2013 <NA> 1
# 15 n03 2014 <NA> 2
# 16 n03 2015 <NA> 3
# 17 n03 2016 <NA> 4
# 18 n03 2017 <NA> 5
# 19 n03 2018 <NA> 6
# 20 n03 2019 <NA> 7
# 21 n03 2020 <NA> 8
# 22 n03 2021 <NA> 9
# 23 n03 2022 <NA> 10