Search code examples
rcumsum

How to use cumulative sum function for grouped data with missing observations


I work with a data frame that that looks like that:

enter image description here

    DATUM               CP                SMER  TRH   MNOZSTVI  CENA POPLATKY   OBJEM UCET  KVARTAL   ROK AKTUALNI.MNOZSTVI
   <dttm>              <chr>             <chr> <chr>    <dbl> <dbl>    <dbl>   <dbl> <chr> <chr>   <dbl>             <dbl>
 1 2020-03-03 00:00:00 CEZ               K     BCPP        50 465.      91.3 -23240  CZK   Q1       2020                NA
 2 2020-03-04 00:00:00 CEZ               K     BCPP        50 467.      58.9 -13980  CZK   Q1       2020                NA
 3 2020-03-12 00:00:00 CEZ               P     BCPP        30 398       51.8  11940  CZK   Q1       2020                NA
 4 2020-03-25 00:00:00 KOMERCNI BANKA    K     BCPP        40 542       85.9 -21680  CZK   Q1       2020                NA
 5 2020-03-25 00:00:00 MONETA MONEY BANK K     BCPP       300  58.4     71.3 -17505  CZK   Q1       2020                NA
 6 2020-03-30 00:00:00 CEZ               K     BCPP        10 391       50    -3910  CZK   Q1       2020                NA
 7 2020-04-02 00:00:00 USD               K     NA        1000  25.8      0   -25778  CZK   Q2       2020                NA
 8 2020-04-03 00:00:00 USD               K     NA        3000  26.1      0   -78392  CZK   Q2       2020                NA
 9 2020-04-04 00:00:00 USD               K     NA        1000  26.4      0   -26363. CZK   Q2       2020                NA
10 2020-04-06 00:00:00 AVAST             K     BCPP       150 125.      75.8 -18810  CZK   Q2       2020                NA

And I would like to fill cumulative sum of variable MNOZSTVI into variable AKTUALNI.MNOZSTVI grouped by CP. So vector AKTUALNI.MNOZSTVI should be c(50,100,130,40,300,140,1000,4000,5000,150, etc.).

The problem is that some values of MNOZSTVI are missing and so I dont know how to use function cumsun() that can't cope with missing values + I struggle to perform it for grouped data.

Does anybody know how to do that either woth the help of cumsum() or some other function? Thank you.


Solution

  • We can group by 'CP' and get the cumsum of 'MNOZSTVI' in mutate

    library(dplyr)
    df1 <- df1 %>%
         group_by(CP) %>%
         mutate(AKTUALNI.MNOZSTVI  = cumsum(MNOZSTVI))
    

    Or use base R with ave

    df1$AKTUALNI.MNOZSTVI <- with(df1, ave(MNOZSTVI, CP, FUN = cumsum))