Using the information on hand I need to predict how much of a particular product we need next month. I have several months worth of data going back, however the data is separated by both VPN and by a separate warehouse number. I just need to know how much to order in general and ignore the warehouse separation. we'll be adding that back in later.
There are multiple duplicates of many of the VPN's and i would like to consolidate all the duplicates and also sum the numbers that have been separated.
VPN Month To Date December November October September August July June May April March
0A36227-AA 15 6 4 2 NA 4 6 4 2 <NA> 4
0A36227-AA NA 1 NA NA NA NA 1 <NA> <NA> <NA> <NA>
0A36227-AA 2 3 1 NA 2 3 3 1 <NA> 2 3
0A36258-AA NA NA NA 1 NA NA <NA> <NA> 1 <NA> <NA>
0A36258-AA 1 NA 1 NA NA NA <NA> 1 <NA> <NA> <NA>
0A36258-AA NA NA NA 1 NA NA <NA> <NA> 1 <NA> <NA>
0A36258-AA 1 NA NA NA NA NA <NA> <NA> <NA> <NA> <NA>
So i want to combine all the duplicates and add all the numbers from the rows into just one row per VPN.
I've tried using the aggregate function and it didn't work for me. i may have used it wrong though.
any help would be appreciated!
also there are some cases where it may cause an infinite number to show up. if anyone has any further advice for how to handle that it would be welcome.
You basically want to know how to perform sum while grouping in your data frame. You will find plenty of answer. I have a data.table solution for your case:
plouf <- read.table(text = " VPN Month.To.Date December November October September August July June May April March
0A36227-AA 15 6 4 2 NA 4 6 4 2 <NA> 4
0A36227-AA NA 1 NA NA NA NA 1 <NA> <NA> <NA> <NA>
0A36227-AA 2 3 1 NA 2 3 3 1 <NA> 2 3
0A36258-AA NA NA NA 1 NA NA <NA> <NA> 1 <NA> <NA>
0A36258-AA 1 NA 1 NA NA NA <NA> 1 <NA> <NA> <NA>
0A36258-AA NA NA NA 1 NA NA <NA> <NA> 1 <NA> <NA>
0A36258-AA 1 NA NA NA NA NA <NA> <NA> <NA> <NA> <NA>",
stringsAsFactors = FALSE, header = TRUE)
here is the code
DT <- setDT(plouf)
tochange <- names(DT)[!names(DT) %in% "VPN"]
here the tochange vector is the list of your column you want to average
DT[,c(tochange) := lapply(.SD,function(x){as.numeric(x)}),.SDcols = tochange]
DT[,lapply(.SD,function(x){sum(x,na.rm = TRUE)}),.SDcols = tochange,by = VPN]
The first line is to set everything to numeric¨
The second line perform the sum ignoring the NAs and grouping by VPN. I am not 100% sure that is what you wanted.
VPN Month.To.Date December November October September August July June May April March i
1: 0A36227-AA 17 10 5 2 2 7 10 5 2 2 7 10
2: 0A36258-AA 2 0 1 2 0 0 0 1 2 0 0 0
I hope it helps
here is the dplyr equivalent
plouf %>%
mutate_at(vars(tochange),funs(as.numeric)) %>%
group_by(VPN) %>%
summarise_at(vars(tochange),funs(sum(.,na.rm = TRUE)))