Search code examples
rsumlagshift

How do you add rows that have NA as prior columns in R?


All,

I have a dataframe with Dates on the first column and categories across as such:

Accounts <- c('A','B','C','D',
          'A','B','C','D',
          'A','B','C','D')
Dates <- as.Date(c('2016-01-31', '2016-01-31','2016-01-31','2016-01-31',
               '2016-02-28','2016-02-28','2016-02-28','2016-02-28',
               '2016-03-31','2016-03-31','2016-03-31','2016-03-31'))
Balances <- c(100,NA,NA,NA,
          90,50,10,NA,
          80,40,5,120)
Origination <- data.frame(Dates,Accounts,Balances)

library(reshape2)
Origination <- dcast(Origination,Dates ~ Accounts, value.var = "Balances")

       Dates   A  B  C   D
1 2016-01-31 100 NA NA  NA
2 2016-02-28  90 50 10  NA
3 2016-03-31  80 40  5 120

The goal is to sum rows where the prior values is NA. I tried to used lag or shift but don't have the knowledge to pull it off.

So for this dataframe I would like a Totals column at the end it values 60 (50 + 10) and 120 for February and March.

Is this doable?

Regards, Aksel


Solution

  • Shift the selection down a row, filter out all the non-NA's as 0, and then use rowSums:

    sel <- rbind(FALSE, !is.na(head(Origination[-1], -1)))
    #sel
    #         A     B     C     D
    #[1,] FALSE FALSE FALSE FALSE
    #[2,]  TRUE FALSE FALSE FALSE
    #[3,]  TRUE  TRUE  TRUE FALSE
    
    rowSums(replace(Origination[-1], sel, 0), na.rm=TRUE)
    #[1] 100  60 120
    

    If you want the first row to be totally excluded, rather than totally included, just change the FALSE to TRUE:

    sel <- rbind(TRUE, !is.na(head(Origination[-1], -1)))
    rowSums(replace(Origination[-1], sel, 0), na.rm=TRUE)
    #[1]   0  60 120