Search code examples
rdplyrsummarize

Summarize with arithmetic operations on rows by column entry


library(tidyverse)
set.seed(1)
start <- mdy("01/01/2022")
end <- start + as.difftime(4, units = "days")
days <- seq(from = start, to = end, by = 1)
days <- sample(days, 100, replace = T)
flip <- sample(c("Heads", "Tails"), 100, replace = TRUE)
numbers <- rchisq(100, 30)

df <- tibble(days, numbers, flip)

I have this dataframe and would like to summarize it for each day by taking the total sum of numbers for each flip and creating this ratio, (Heads-Tails)/(Heads+Tails) below is an example for a particular day:

df <- df %>% group_by(days, flip) %>%
  summarize(total = sum(numbers)) %>%
  ungroup()

> (df[[1,3]]-df[[2,3]])/(df[[1,3]]+df[[2,3]])
[1] -0.4164241
> (df[[3,3]]-df[[4,3]])/(df[[3,3]]+df[[4,3]])
[1] 0.06134484
> (df[[5,3]]-df[[6,3]])/(df[[5,3]]+df[[6,3]])
[1] 0.6067984
> (df[[7,3]]-df[[8,3]])/(df[[7,3]]+df[[8,3]])
[1] -0.2603271
> (df[[9,3]]-df[[10,3]])/(df[[9,3]]+df[[10,3]])
[1] 0.309745

These are the ratios I would like to end up with for every day.


Solution

  • Starting from the first df,

    df %>%
      group_by(days, flip) %>%
      summarize(total = sum(numbers)) %>%
      group_by(days) %>%
      summarize(z = -diff(total)/sum(total))
    # # A tibble: 5 × 2
    #   days             z
    #   <date>       <dbl>
    # 1 2022-01-01 -0.416 
    # 2 2022-01-02  0.0613
    # 3 2022-01-03  0.607 
    # 4 2022-01-04 -0.260 
    # 5 2022-01-05  0.310 
    

    though this relies on the order of flip. To be a little more resilient to that, we can do:

    ... %>%
      summarize(z = (total[flip == "Heads"] - total[flip == "Tails"]) / sum(total))
    

    We can also try a pivoting approach:

    df %>%
      group_by(days, flip) %>%
      summarize(total = sum(numbers)) %>%
      ungroup() %>%
      pivot_wider(id_cols = days, names_from = flip, values_from = total) %>%
      mutate(z = (Heads - Tails) / (Heads + Tails))
    # # A tibble: 5 × 4
    #   days       Heads Tails       z
    #   <date>     <dbl> <dbl>   <dbl>
    # 1 2022-01-01  205. 498.  -0.416 
    # 2 2022-01-02  315. 279.   0.0613
    # 3 2022-01-03  363.  88.8  0.607 
    # 4 2022-01-04  227. 386.  -0.260 
    # 5 2022-01-05  403. 212.   0.310