Suppose I've got sales data for three sales reps selling a variety of products. What makes it difficult is that each rep sells a different combination of products, and not necessarily the same number of them, either:
Bob sells Products A, B, and C
Mike sells Products A, B, C, and D
Sara sells Products A, B, and E
RepName Product SalesDollarAmt SalesQty
1 Bob A 43 3
2 Mike A 14 5
3 Sara A 53 1
4 Bob B 100 35
5 Mike B 215 80
6 Sara B 310 105
7 Bob C 5 8
8 Mike C 10 3
9 Mike D 105 50
10 Sara E 25 18
I want to rotate this on the Product, so that the result looks like this:
RepName Product.1 SalesDollarAmt.1 SalesQty.1 Product.2 SalesDollarAmt.2 SalesQty.2 Product.3 SalesDollarAmt.3 SalesQty.3 Product.4 SalesDollarAmt.4 SalesQty.4
1 Bob A 43 3 B 100 35 C 5 8 <NA> 0 0
2 Mike A 14 5 B 215 80 C 10 3 D 105 50
3 Sara A 53 1 B 310 105 E 25 18 <NA> 0 0
If they all had the same products, I'd have filtered them by Product into separate dataframes and then joined them back together on RepName. I've tried everything I can think of with spread
and dcast
. Thanks for any help!
Code for sample data frames:
library(tidyverse)
# initial sales data
df <- tribble(
~RepName, ~Product, ~SalesDollarAmt, ~SalesQty,
#-------------------------------
"Bob", "A", 43, 3,
"Mike", "A", 14, 5,
"Sara", "A", 53, 1,
"Bob", "B", 100, 35,
"Mike", "B", 215, 80,
"Sara", "B", 310, 105,
"Bob", "C", 5, 8,
"Mike", "C", 10, 3,
"Mike", "D", 105, 50,
"Sara", "E", 25, 18
)
# ideally rotated data
df2 <- tribble(
~RepName, ~Product.1, ~SalesDollarAmt.1, ~SalesQty.1, ~Product.2, ~SalesDollarAmt.2, ~SalesQty.2, ~Product.3, ~SalesDollarAmt.3, ~SalesQty.3, ~Product.4, ~SalesDollarAmt.4, ~SalesQty.4,
#--------------------------------------------------------------
"Bob", "A", 43, 3, "B", 100, 35, "C", 5, 8, NA, 0, 0,
"Mike", "A", 14, 5, "B", 215, 80, "C", 10, 3, "D", 105, 50,
"Sara", "A", 53, 1, "B", 310, 105, "E", 25, 18, NA, 0, 0
)
Using a combination of row_number
, gather
, spread
, and unite
, we can reshape the data. It's up to you to reorder the columns if you so choose. In the last line, we specify convert = TRUE
in the call to spread
. This is due to the fact that when we convert the data to long format (using gather
), the column values are converted to character. Specifying convert = TRUE
in the call to spread
(should) get the values back to a useful form.
df %>%
group_by(RepName) %>%
mutate(product_count = row_number()) %>% # product "id" within RepName
gather(variable, value, -RepName, -product_count) %>% # reshape to long
unite(var_prod, variable, product_count) %>%
spread(var_prod, value, convert = TRUE) # reshape to wide
RepName Product_1 Product_2 Product_3 Product_4 SalesDollarAmt_1 SalesDollarAmt_2 SalesDollarAmt_3 SalesDollarAmt_4 SalesQty_1 SalesQty_2 SalesQty_3 SalesQty_4
1 Bob A B C <NA> 43 100 5 <NA> 3 35 8 <NA>
2 Mike A B C D 14 215 10 105 5 80 3 50
3 Sara A B E <NA> 53 310 25 <NA> 1 105 18 <NA>