My sales dataset includes 3 columns: Countries, Sales Type/Method, Total Quarterly Revenue. Here's a display of the first few rows for a better idea:
Retailer.country Order.method.type Qtr.Rev
<fctr> <fctr> <dbl>
1 Australia E-mail 171407.28
2 Australia Sales visit 2013909.18
3 Australia Special 158795.34
4 Australia Telephone 2289201.87
5 Australia Web 1738303.59
6 Austria Sales visit 66926.18
7 Austria Telephone 1671887.40
8 Austria Web 7050164.50
9 Belgium Sales visit 1655507.05
10 Belgium Web 6222440.26
etc.........
Here's the dput of this data:
structure(list(Retailer.country = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 3L, 3L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L,
7L, 8L, 8L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 10L, 11L, 11L, 11L,
11L, 11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 14L,
14L, 14L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L, 17L, 17L, 17L,
17L, 18L, 18L, 19L, 19L, 19L, 19L, 19L, 20L, 20L, 20L, 20L, 20L,
21L, 21L, 21L, 21L, 21L, 21L), .Label = c("Australia", "Austria",
"Belgium", "Brazil", "Canada", "China", "Denmark", "Finland",
"France", "Germany", "Italy", "Japan", "Korea", "Mexico",
"Netherlands",
"Singapore", "Spain", "Sweden", "Switzerland", "United Kingdom",
"United States"), class = "factor"), Order.method.type =
structure(c(1L,
4L, 5L, 6L, 7L, 4L, 6L, 7L, 4L, 7L, 7L, 1L, 2L, 4L, 7L, 2L, 4L,
6L, 7L, 4L, 7L, 4L, 7L, 2L, 4L, 6L, 7L, 1L, 3L, 4L, 7L, 1L, 2L,
4L, 5L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 4L, 6L, 7L, 4L, 5L, 7L,
2L, 3L, 6L, 7L, 2L, 5L, 6L, 7L, 2L, 3L, 6L, 7L, 1L, 7L, 2L, 4L,
5L, 6L, 7L, 1L, 2L, 4L, 6L, 7L, 2L, 3L, 4L, 5L, 6L, 7L), .Label =
c("E-mail",
"Fax", "Mail", "Sales visit", "Special", "Telephone", "Web"), class =
"factor"),
Qtr.Rev = c(171407.28, 2013909.18, 158795.34, 2289201.87,
1738303.59, 66926.18, 1671887.4, 7050164.5, 1655507.05,
6222440.26,
7746789.52, 6864270.12, 195549.5, 450628.79, 12376528.53,
415128.31, 1453194.14, 2735416.3, 15777880.11, 413978.16,
3776833.13, 308638.6, 12328172.97, 709194.65, 1304167.86,
5897377.14, 11048160.97, 1546079.43, 1247170.05, 2373591.15,
12102240.99, 2461322.51, 165800.42, 1397604.56, 198705.05,
7413833.64, 2662351.94, 289704.5, 680467.87, 87186.72, 343708.86,
1802166.73, 16990817.52, 2821127.32, 431860.34, 10144353.75,
5063353.42, 1725508.54, 3571760.87, 593828.88, 1074860.66,
2981026.86, 5254137.56, 469627.61, 908725.05, 1625096.56,
9677070.09, 88788.41, 337710.73, 254360.21, 7835117.44,
1292812.39,
4818848.86, 217936.39, 792168.42, 790344.28, 109161.04,
4565896.64,
697619.35, 264500.2, 189218.02, 2022968.96, 13756025.4,
1357389.56,
2352483.29, 2842600.85, 685752.21, 13437403.28, 29573813.7
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-79L), .Names = c("Retailer.country", "Order.method.type", "Qtr.Rev"
))
I'm creating a contingency table in R that shows the quarterly revenue generated by each sales method for each country. The final output should look similar to this:
Retailer.country E-mail Fax Mail Sales visit Special Telephone Web TOTAL.cn
1 Australia 171407.3 0.00 0.0 2013909.18 158795.3 2289201.9 1738304 6371617
2 Austria 0.0 0.00 0.0 66926.18 0.0 1671887.4 7050164 8788978
3 Belgium 0.0 0.00 0.0 1655507.05 0.0 0.0 6222440 7877947
4 Brazil 0.0 0.00 0.0 0.00 0.0 0.0 7746790 7746790
5 Canada 6864270.1 195549.50 0.0 450628.79 0.0 0.0 12376529 19886977
6 China 0.0 415128.31 0.0 1453194.14 0.0 2735416.3 15777880 20381619
7 Denmark 0.0 0.00 0.0 413978.16
...
20 United Kingdom 697619.3 264500.20 0.0 189218.02 0.0 2022969.0 13756025 16930332
21 United States 0.0 1357389.56 2352483.3 2842600.85 685752.2 13437403.3 29573814 50249443
22 TOTAL.type 15695863.0 4767448.43 5692692.6 23233800.42 4811539.3 35257926.7 203769190 293228461
The cast() function from the reshape library gets most of the job done, and only leaves the summary column and row for all values to be computed.
cast(sales.by.country, Retailer.country ~ Order.method.type,
fill=0) -> sales.by.country
Summing the rows into a new column named "TOTAL.cn" is pretty simple:
sales.by.country$TOTAL.cn <- rowSums(sales.by.country[,c(2:8)])
But summing the columns become a major headache, because the first component of that last row must either be a factor or character. I converted the first column "Retailer.country" to the character type, because it really serves as nothing more than a visual label.
After fussing around with several functions, this is the best code I was able to create to achieve the intended row summation:
# Sum the numeric columns, which is everything *except* column 1
total.by.ordertype <- (colSums(sales.by.country[,-1]))
# Create the Total by Order row
total.by.ordertype.row <- list("TOTAL.type", total.by.ordertype[1],
total.by.ordertype[2], total.by.ordertype[3], total.by.ordertype[4],
total.by.ordertype[5], total.by.ordertype[6], total.by.ordertype[7],
total.by.ordertype[8])
# Add the Total by Order row to the bottom of the table
sales.by.country[22, ] <- total.by.ordertype.row
It works and maintains the proper data types among all columns... BUT I figure there has to be a more efficient way, perhaps by using the apply family of functions, something from dplyr, etc. Maybe the only way is to write my own function?
For example, future datasets may have 50+ different sales methods. When creating the list for the "Total by Order" row (above), I had to call out every single cell within the vector, separated by commas, in order to successfully add this to my existing table. Other efforts converted the data types of all the other columns to characters, which messes everything up.
I don't mind copying/pasting "total.by.ordertype" 8 times so much. But what happens when I'm dealing with 50-100 order types? Is there a cleaner method to reproduce any of this?
Thank you!
The cast()
function from the reshape
library can do the whole job. With the parameter margin = TRUE
, all row and column totals will be computed:
reshape::cast(sales.by.country, Retailer.country ~ Order.method.type, fun.aggregate = sum,
fill = 0, margins = TRUE)
Retailer.country E-mail Fax Mail Sales visit Special Telephone Web (all) 1 Australia 171407.3 0.00 0.0 2013909.18 158795.3 2289201.9 1738304 6371617 2 Austria 0.0 0.00 0.0 66926.18 0.0 1671887.4 7050164 8788978 3 Belgium 0.0 0.00 0.0 1655507.05 0.0 0.0 6222440 7877947 4 Brazil 0.0 0.00 0.0 0.00 0.0 0.0 7746790 7746790 5 Canada 6864270.1 195549.50 0.0 450628.79 0.0 0.0 12376529 19886977 6 China 0.0 415128.31 0.0 1453194.14 0.0 2735416.3 15777880 20381619 7 Denmark 0.0 0.00 0.0 413978.16 0.0 0.0 3776833 4190811 8 Finland 0.0 0.00 0.0 308638.60 0.0 0.0 12328173 12636812 9 France 0.0 709194.65 0.0 1304167.86 0.0 5897377.1 11048161 18958901 10 Germany 1546079.4 0.00 1247170.1 2373591.15 0.0 0.0 12102241 17269082 11 Italy 2461322.5 165800.42 0.0 1397604.56 198705.0 0.0 7413834 11637266 12 Japan 2662351.9 289704.50 680467.9 87186.72 343708.9 1802166.7 16990818 22856404 13 Korea 0.0 0.00 0.0 2821127.32 0.0 431860.3 10144354 13397341 14 Mexico 0.0 0.00 0.0 5063353.42 1725508.5 0.0 3571761 10360623 15 Netherlands 0.0 593828.88 1074860.7 0.00 0.0 2981026.9 5254138 9903854 16 Singapore 0.0 469627.61 0.0 0.00 908725.1 1625096.6 9677070 12680519 17 Spain 0.0 88788.41 337710.7 0.00 0.0 254360.2 7835117 8515977 18 Sweden 1292812.4 0.00 0.0 0.00 0.0 0.0 4818849 6111661 19 Switzerland 0.0 217936.39 0.0 792168.42 790344.3 109161.0 4565897 6475507 20 United Kingdom 697619.3 264500.20 0.0 189218.02 0.0 2022969.0 13756025 16930332 21 United States 0.0 1357389.56 2352483.3 2842600.85 685752.2 13437403.3 29573814 50249443 22 (all) 15695863.0 4767448.43 5692692.6 23233800.42 4811539.3 35257926.7 203769190 293228461
Of course, fun.aggregate
has to be specified as well.
The same functionality is also available from the reshape2
package, the successor of reshape
, but about 4 times faster for this small sample size.
reshape2::dcast(sales.by.country, Retailer.country ~ Order.method.type, fun.aggregate = sum,
fill = 0, margins = TRUE)
dcast()
is also available from the data.table
package which claims to be faster than reshape2::dcast()
. Unfortunately, the margins
parameter hasn't been implemented yet (current CRAN version 1.10.4). Therefore, the margins have to be computed separately and combined with the original data:
DT2 <- rbind(
DT,
DT[, .(Qtr.Rev = sum(Qtr.Rev)), by = Retailer.country],
DT[, .(Qtr.Rev = sum(Qtr.Rev)), by = Order.method.type],
DT[, .(Qtr.Rev = sum(Qtr.Rev))],
fill = TRUE
)
dcast(DT2, Retailer.country ~ Order.method.type, fill = 0)
Retailer.country E-mail Fax Mail Sales visit Special Telephone Web NA 1: Australia 171407.3 0.00 0.0 2013909.18 158795.3 2289201.9 1738304 6371617 2: Austria 0.0 0.00 0.0 66926.18 0.0 1671887.4 7050164 8788978 3: Belgium 0.0 0.00 0.0 1655507.05 0.0 0.0 6222440 7877947 4: Brazil 0.0 0.00 0.0 0.00 0.0 0.0 7746790 7746790 5: Canada 6864270.1 195549.50 0.0 450628.79 0.0 0.0 12376529 19886977 6: China 0.0 415128.31 0.0 1453194.14 0.0 2735416.3 15777880 20381619 7: Denmark 0.0 0.00 0.0 413978.16 0.0 0.0 3776833 4190811 8: Finland 0.0 0.00 0.0 308638.60 0.0 0.0 12328173 12636812 9: France 0.0 709194.65 0.0 1304167.86 0.0 5897377.1 11048161 18958901 10: Germany 1546079.4 0.00 1247170.1 2373591.15 0.0 0.0 12102241 17269082 11: Italy 2461322.5 165800.42 0.0 1397604.56 198705.0 0.0 7413834 11637266 12: Japan 2662351.9 289704.50 680467.9 87186.72 343708.9 1802166.7 16990818 22856404 13: Korea 0.0 0.00 0.0 2821127.32 0.0 431860.3 10144354 13397341 14: Mexico 0.0 0.00 0.0 5063353.42 1725508.5 0.0 3571761 10360623 15: Netherlands 0.0 593828.88 1074860.7 0.00 0.0 2981026.9 5254138 9903854 16: Singapore 0.0 469627.61 0.0 0.00 908725.1 1625096.6 9677070 12680519 17: Spain 0.0 88788.41 337710.7 0.00 0.0 254360.2 7835117 8515977 18: Sweden 1292812.4 0.00 0.0 0.00 0.0 0.0 4818849 6111661 19: Switzerland 0.0 217936.39 0.0 792168.42 790344.3 109161.0 4565897 6475507 20: United Kingdom 697619.3 264500.20 0.0 189218.02 0.0 2022969.0 13756025 16930332 21: United States 0.0 1357389.56 2352483.3 2842600.85 685752.2 13437403.3 29573814 50249443 22: NA 15695863.0 4767448.43 5692692.6 23233800.42 4811539.3 35257926.7 203769190 293228461 Retailer.country E-mail Fax Mail Sales visit Special Telephone Web NA