Search code examples
rdplyrbusiness-intelligencemodeling

I need to add several rows together based on the fact that they have something in common with another row


Using the information on hand I need to predict how much of a particular product we need next month. I have several months worth of data going back, however the data is separated by both VPN and by a separate warehouse number. I just need to know how much to order in general and ignore the warehouse separation. we'll be adding that back in later.

There are multiple duplicates of many of the VPN's and i would like to consolidate all the duplicates and also sum the numbers that have been separated.

 VPN         Month To Date December November October September August July June  May April March

0A36227-AA            15        6        4       2        NA      4    6    4    2  <NA>     4
0A36227-AA            NA        1       NA      NA        NA     NA    1 <NA> <NA>  <NA>  <NA>
0A36227-AA             2        3        1      NA         2      3    3    1 <NA>     2     3
0A36258-AA            NA       NA       NA       1        NA     NA <NA> <NA>    1  <NA>  <NA>
0A36258-AA             1       NA        1      NA        NA     NA <NA>    1 <NA>  <NA>  <NA>
0A36258-AA            NA       NA       NA       1        NA     NA <NA> <NA>    1  <NA>  <NA>
0A36258-AA             1       NA       NA      NA        NA     NA <NA> <NA> <NA>  <NA>  <NA>

So i want to combine all the duplicates and add all the numbers from the rows into just one row per VPN.

I've tried using the aggregate function and it didn't work for me. i may have used it wrong though.

any help would be appreciated!

also there are some cases where it may cause an infinite number to show up. if anyone has any further advice for how to handle that it would be welcome.


Solution

  • You basically want to know how to perform sum while grouping in your data frame. You will find plenty of answer. I have a data.table solution for your case:

    plouf <- read.table(text = "   VPN  Month.To.Date December November October September August July June  May April March
    
                           0A36227-AA            15        6        4       2        NA      4    6    4    2  <NA>     4
                           0A36227-AA            NA        1       NA      NA        NA     NA    1 <NA> <NA>  <NA>  <NA>
                           0A36227-AA             2        3        1      NA         2      3    3    1 <NA>     2     3
                           0A36258-AA            NA       NA       NA       1        NA     NA <NA> <NA>    1  <NA>  <NA>
                           0A36258-AA             1       NA        1      NA        NA     NA <NA>    1 <NA>  <NA>  <NA>
                           0A36258-AA            NA       NA       NA       1        NA     NA <NA> <NA>    1  <NA>  <NA>
                           0A36258-AA             1       NA       NA      NA        NA     NA <NA> <NA> <NA>  <NA>  <NA>",
                        stringsAsFactors = FALSE, header = TRUE)
    

    here is the code

    DT <- setDT(plouf)
    tochange <- names(DT)[!names(DT) %in% "VPN"]
    

    here the tochange vector is the list of your column you want to average

    DT[,c(tochange) := lapply(.SD,function(x){as.numeric(x)}),.SDcols = tochange]
    DT[,lapply(.SD,function(x){sum(x,na.rm = TRUE)}),.SDcols = tochange,by = VPN]
    

    The first line is to set everything to numeric¨

    The second line perform the sum ignoring the NAs and grouping by VPN. I am not 100% sure that is what you wanted.

              VPN Month.To.Date December November October September August July June May April March  i
    1: 0A36227-AA            17       10        5       2         2      7   10    5   2     2     7 10
    2: 0A36258-AA             2        0        1       2         0      0    0    1   2     0     0  0
    

    I hope it helps

    here is the dplyr equivalent

    plouf %>%
      mutate_at(vars(tochange),funs(as.numeric)) %>%
      group_by(VPN) %>%
      summarise_at(vars(tochange),funs(sum(.,na.rm = TRUE)))