Search code examples
rcumulative-sum

I want to cumulate a list of string values by date


I have column A, which is a date and B which is a string value.

  A B
1 01/02/23 apple
2 01/02/23 pear
3 01/02/23 apple
4 02/02/23 apple
3 02/02/23 apple
3 02/02/23 banana
3 02/02/23 banana
3 02/02/23 apple

I would like the output to be a list on unique dates, with cumulation day on day of the string values in B.

Like so

  A B
1 01/02/23 apple
    pear
1 02/02/23 apple
    pear
    banana

Thank you in advance!!!!

I've tried basic cumulations but cant seem to get to the desired result when i use string value not numbers.


Solution

  • Update removed first answer: We can do it using accumulate together with union and some data tweaking:

    library(dplyr)
    library(tidyr) 
    
    df %>% 
      arrange(A) %>% 
      mutate(cumulative = accumulate(B, ~union(.x, .y))) %>% 
      reframe(B = last(cumulative), .by=A) %>% 
      unnest(B) %>% 
      mutate(A = ifelse(row_number() != 1, "", A), .by=A)
    
    
      A      B
    1 1  apple
    2     pear
    3 2  apple
    4     pear
    5   banana