I am working with a database of all transit systems in America and trying to compare different agencies. Each case is a specific part of an organization. For example, bus lines are separate from subways. I would like to combine the values for all of the cases for a given agency.
Basically I want to sum the value of each column for each "Trs_Id" and delete the rest. This data frame is the breakdown of operating expenses ("opex"). Here is what my dataset looks like in R:
> colnames(opex)
[1] "Trs_Id" "Mode_Cd"
[3] "Service_Cd" "Expense_Category_Desc"
[5] "Op_Sal_Wage_Amt" "Other_Sal_Wage_Amt"
[7] "Fringe_Benefit_Amt" "Service_Costs_Amt"
[9] "Fuel_Lubricant_Amt" "Tire_Tube_Amt"
[11] "Other_Mat_Sup_Amt" "Utility_Amt"
[13] "Casuality_Liability_Amt" "Tax_Amt"
[15] "In_Report_Amt" "Sep_Report_Amt"
[17] "Misc_Expense_Amt" "Expense_Transfer_Amt"
[19] "Ada_Related_Amt"
> NROW(opex)
[1] 6956
> id_nm <- table(opex$Trs_Id)
> NROW(id_nm)
[1] 616
In spite of you didn't make a reproducible example that demonstrate your problem, I think you can solve it using ddply
function from plyr
package, but you can also use the base function aggregate
. I prefer do it with ddply
.
Here's some random numbers for figuring out your data.frame
#install.packages('plyr')
library(plyr)
set.seed(007) # for the example being reproducible
Mode_Cd1 <- replicate(4,sample(LETTERS[1:26], 1, replace=T)) # random genereation of variable Mode_Cd
Mode_Cd2 <- replicate(4,sample(LETTERS[1:26], 1, replace=T))
data_set <- data.frame(Trs_Id = rep(paste('00', 1:4, sep=''), each=3),
Mode_Cd = sample(paste(Mode_Cd1, Mode_Cd2, sep=''), 12, replace=T),
Service_Cd = sample(paste(Mode_Cd2, Mode_Cd1, sep=''), 12, replace=T),
Op_Sal_Wage_Amt = rnorm(12,5000,100),
Other_Sal_Wage = rnorm(12,3000,800))
data_set # this is how my random data_set looks like
Trs_Id Mode_Cd Service_Cd Op_Sal_Wage_Amt Other_Sal_Wage
1 001 ZG ID 4910.620 2213.558
2 001 KU UK 4969.267 2779.149
3 001 ZG ZB 4999.518 2303.319
4 002 ZG ZB 5098.816 3574.968
5 002 BZ ZB 5083.975 3088.522
6 002 ZG GZ 5070.534 2937.227
7 003 KU ID 5130.596 2663.608
8 003 ZG UK 4861.200 2550.299
9 003 DI ZB 5127.292 3798.011
10 004 ZG UK 5018.419 2115.896
11 004 BZ ID 5075.228 2886.170
12 004 KU UK 5059.175 3251.996
ddply(data_set,.(Trs_Id),numcolwise(sum)) # The sum you want.
Trs_Id Op_Sal_Wage_Amt Other_Sal_Wage
1 001 14879.40 7296.026
2 002 15253.33 9600.717
3 003 15119.09 9011.918
4 004 15152.82 8254.062