I have presence-absence data across multiple sites and years that look something like this:
df <- tibble(Site = c("A","B","C","D","E"),
"1999"=c(0,NA,1,NA,1),
"2000"=c(1,NA,NA,0,1),
"2001"=c(NA,0,1,NA,0),
"2002"=c(NA,1,NA,1,0),
"2003"=c(0,NA,0,1,NA)
)
I am trying to figure out how I can calculate the number of times a 0 changes to a 1 and vice versa and place them in columns at the end of the dataframe. I would also like to be able to calculate the number of times a 1 could have changed to a 0 but didn't and vice versa and place those totals in separate columns at the end of the dataframe.
I understand how to add columns at the end of the dataframe and obtain summary statistics across all rows. e.g.,
## Group input by rows
rowwise(df)
## Add column called "0t1" (to contain the number of times a 0 changed to a 1) and sum across all columns starting with the "19" column, ignoring NAs
df %>% mutate("0t1" = sum(across(starts_with("19")),na.rm=T))
However this of course just gives me the sum of the number of values within each row.
Site `1999` `2000` `2001` `2002` `2003` `0t1`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0 1 NA NA 0 2
2 B NA NA 0 1 NA 2
3 C 1 NA 1 NA 0 2
4 D NA 0 NA 1 1 2
5 E 1 1 0 0 NA 2
What I can't seem to figure out now is how to calculate values conditional on the value in the previous non-NA cell, which would produce something that looks like this:
Site `1999` `2000` `2001` `2002` `2003` `0t1` `1t0`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0 1 NA NA 0 1 1
2 B NA NA 0 1 NA 1 0
3 C 1 NA 1 NA 0 0 1
4 D NA 0 NA 1 1 1 0
5 E 1 1 0 0 NA 0 1
Here, we can use lead
to create the condition after removing the NA
elements in each row (na.omit
). The condition is to check the current value as 0 and the next as 1 and take the sum
apply(df[,-1], 1, function(x) {x1 <- na.omit(x); sum(x1 == 0 & lead(x1) == 1, na.rm = TRUE)})
Or the same with dplyr
library(dplyr)
df %>%
rowwise %>%
mutate(t1 = {x1 <- na.omit(c_across(where(is.numeric)))
sum(x1 == 0 & lead(x1) ==1, na.rm = TRUE)
}) %>%
ungroup
-output
# A tibble: 5 x 7
# Site `1999` `2000` `2001` `2002` `2003` t1
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#1 A 0 1 NA NA 0 1
#2 B NA NA 0 1 NA 1
#3 C 1 NA 1 NA 0 0
#4 D NA 0 NA 1 1 1
#5 E 1 1 0 0 NA 0