Search code examples
rpercentagerollapply

Rollapply percentage from logical conditions (Rolling rate in R )


I have a data frame in R with two columns with logical conditions that looks like this :

check1 = as.logical(c(rep("TRUE",3),rep("FALSE",2),rep("TRUE",3),rep("FALSE",2)))
check2 = as.logical(c(rep("TRUE",5),rep("FALSE",2),rep("TRUE",3)))
dat = cbind(check1,check2)

resulting to :

    check1 check2
 [1,]   TRUE   TRUE
 [2,]   TRUE   TRUE
 [3,]   TRUE   TRUE
 [4,]  FALSE   TRUE
 [5,]  FALSE   TRUE
 [6,]   TRUE  FALSE
 [7,]   TRUE  FALSE
 [8,]   TRUE   TRUE
 [9,]  FALSE   TRUE
[10,]  FALSE   TRUE

I want to roll calculate the percentage of TRUEs on each column which ideally must look like this :

check1 check2
1/1 1/1
2/2 2/2
3/3 3/3
3/4 4/4
3/5 5/5
4/6 5/6
5/7 5/7
6/8 6/8
6/9 7/9
6/10 8/10

maybe ...

dat%>%
  mutate(cumsum(check1)/seq_along(check1))

Any help ?


Solution

  • You are almost there; just use across to apply your function to both columns.

    Alternatively, you can use dplyr::cummean to compute the running proportions.

    A note about terminology: rolling usually refers to computing a statistic (such as the mean or the max) within a fixed-size window. On the other hand, cumulative statistics are computed in an ever-increasig window starting from index 1 (or the first row). See the vignette on window functions. Using the right term may help you to search the documentation for the appropriate function.

    library("tidyverse")
    
    check1 <- as.logical(c(rep("TRUE", 3), rep("FALSE", 2), rep("TRUE", 3), rep("FALSE", 2)))
    check2 <- as.logical(c(rep("TRUE", 5), rep("FALSE", 2), rep("TRUE", 3)))
    dat <- cbind(check1, check2)
    
    cummeans <- as_tibble(dat) %>%
      mutate(
        across(c(check1, check2), ~ cumsum(.) / row_number())
      )
    
    cummeans <- as_tibble(dat) %>%
      mutate(
        across(c(check1, check2), cummean)
      )
    cummeans
    #> # A tibble: 10 × 2
    #>    check1 check2
    #>     <dbl>  <dbl>
    #>  1  1      1    
    #>  2  1      1    
    #>  3  1      1    
    #>  4  0.75   1    
    #>  5  0.6    1    
    #>  6  0.667  0.833
    #>  7  0.714  0.714
    #>  8  0.75   0.75 
    #>  9  0.667  0.778
    #> 10  0.6    0.8
    
    # Plot the cumulative proportions on the y-axis, with one panel for each check
    cummeans %>%
      # The example data has no index column; will use the row ids instead
      rowid_to_column() %>%
      pivot_longer(
        c(check1, check2),
        names_to = "check",
        values_to = "cummean"
      ) %>%
      ggplot(
        aes(rowid, cummean, color = check)
      ) +
      geom_line() +
      # Proportions have a natural range from 0 to 1
      scale_y_continuous(
        limits = c(0, 1)
      )
    

    Created on 2022-03-14 by the reprex package (v2.0.1)