I am trying to cast a dataframe into an other one, see below for the examples:
> start = data.frame(Aa = c('A','A','A','A','a','a','a','a'),Bb = c('B','B','b','b','B','B','b','b'),Cc = c('C','c','C','c','C','c','C','c'),v=c(1,2,3,4,5,6,7,8))
> start
Aa Bb Cc v
1 A B C 1
2 A B c 2
3 A b C 3
4 A b c 4
5 a B C 5
6 a B c 6
7 a b C 7
8 a b c 8
And I would like to have a data frame like this one:
1 A B 3
2 A b 7
3 a B 11
4 a b 15
5 B C 6
6 B c 8
7 b C 10
8 b c 12
9 A C 4
10 A c 6
11 a C 12
12 a c 14
Where line 1 is calculated because we have A-B-C -> 1 and A-B-c -> 2 so A-B -> 3
The fact is that I can imagine a solution with some for loops on the columns, but I need it to time efficient, I can have 100,000 rows and up to 100 columns so I need something fast, and I don't think that the for loop are really efficient in R.
Do you have any ideas?
Thanks you!
Perhaps you can use combn
on the column names.
Here, I've used data.table
for its efficient aggregation and for the convenience of rbindlist
to put the data back together.
library(data.table)
setDT(start)
rbindlist(combn(names(start)[1:3], 2, FUN = function(x) {
start[, sum(v), x]
}, simplify = FALSE))
# Aa Bb V1
# 1: A B 3
# 2: A b 7
# 3: a B 11
# 4: a b 15
# 5: A C 4
# 6: A c 6
# 7: a C 12
# 8: a c 14
# 9: B C 6
# 10: B c 8
# 11: b C 10
# 12: b c 12