Search code examples
rdplyrrowwise

Calculate the number of times a 0 changes to a 1 in a single row across multiple columns


I have presence-absence data across multiple sites and years that look something like this:

df <- tibble(Site = c("A","B","C","D","E"), 
                "1999"=c(0,NA,1,NA,1),
                "2000"=c(1,NA,NA,0,1),
                "2001"=c(NA,0,1,NA,0),
                "2002"=c(NA,1,NA,1,0),
                "2003"=c(0,NA,0,1,NA)
                 )

I am trying to figure out how I can calculate the number of times a 0 changes to a 1 and vice versa and place them in columns at the end of the dataframe. I would also like to be able to calculate the number of times a 1 could have changed to a 0 but didn't and vice versa and place those totals in separate columns at the end of the dataframe.

I understand how to add columns at the end of the dataframe and obtain summary statistics across all rows. e.g.,

## Group input by rows
rowwise(df)

## Add column called "0t1" (to contain the number of times a 0 changed to a 1) and sum across all columns starting with the "19" column, ignoring NAs
df %>%  mutate("0t1" = sum(across(starts_with("19")),na.rm=T))

However this of course just gives me the sum of the number of values within each row.

  Site  `1999` `2000` `2001` `2002` `2003` `0t1`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>
1 A          0      1     NA     NA      0     2
2 B         NA     NA      0      1     NA     2
3 C          1     NA      1     NA      0     2
4 D         NA      0     NA      1      1     2
5 E          1      1      0      0     NA     2

What I can't seem to figure out now is how to calculate values conditional on the value in the previous non-NA cell, which would produce something that looks like this:

  Site  `1999` `2000` `2001` `2002` `2003` `0t1`  `1t0`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>  <dbl>
1 A          0      1     NA     NA      0     1    1
2 B         NA     NA      0      1     NA     1    0
3 C          1     NA      1     NA      0     0    1
4 D         NA      0     NA      1      1     1    0
5 E          1      1      0      0     NA     0    1


Solution

  • Here, we can use lead to create the condition after removing the NA elements in each row (na.omit). The condition is to check the current value as 0 and the next as 1 and take the sum

     apply(df[,-1], 1, function(x) {x1 <- na.omit(x); sum(x1 == 0 & lead(x1) == 1, na.rm = TRUE)})
    

    Or the same with dplyr

    library(dplyr)
    df %>%
         rowwise %>%
         mutate(t1 = {x1 <- na.omit(c_across(where(is.numeric)))
                      sum(x1 == 0 & lead(x1) ==1, na.rm = TRUE)
                  }) %>%
         ungroup
    

    -output

    # A tibble: 5 x 7
    #  Site  `1999` `2000` `2001` `2002` `2003`    t1
    #  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <int>
    #1 A          0      1     NA     NA      0     1
    #2 B         NA     NA      0      1     NA     1
    #3 C          1     NA      1     NA      0     0
    #4 D         NA      0     NA      1      1     1
    #5 E          1      1      0      0     NA     0