I am practising in pivoting data.table
, example below with lapply
to get substructure features and their corresponding columns where ordering matters for the key column, this could be seen as pivoting in spreadsheet tools. This is similar to pivot-table and perhaps R tool called cube, other subtotal tools may be easier to use than lapply
. I am open to comparisons to more convenient tools, lapply
seems to become quite terse when you need more functions and particularly with different orders.
Demo
I form new subcategories with paste0
in data.table
over which I want to calculate features, such as mean and sum. Small example is below with input and intended output.
Input
> data.table(cn=c('a','a','b','b','c'),test=c(1,2,3,4,0),cat=c(1,1,1,2,2))
cn test cat
1: a 1 1
2: a 2 1
3: b 3 1
4: b 4 2
5: c 0 2
> myData <- data.table(cn=c('a','a','b','b','c'),test=c(1,2,3,4,0),cat=c(1,1,1,2,2))
> myData[order(test), lapply(.SD, paste0, collapse = '>'), by = c('cat')]
cat cn test
1: 2 c>b 0>4
2: 1 a>a>b 1>2>3
#How can I add here new functions to get new columns for the segments such as a>a>b
Output
cat cn test test_sum test_avg
1: 1 a>a>b 1>2>3 6 2
2: 2 c>b 0>4 4 2
Put more simply, how can we get the output with the input? Is lapply
the correct choice here? Notice that ordering matters for some functions and not for some functions.
something like this?
#set keys for ordering
setkey(DT, cat, test )
#
DT[, .(cn = paste0( cn, collapse = ">"),
test = paste0( test, collapse = ">" ),
test_sum = sum( test ),
test_avg = mean( test ) ),
by = cat ]
# cat cn test test_sum test_avg
# 1: 1 a>a>b 1>2>3 6 2
# 2: 2 c>b 0>4 4 2
But, you can also sort inside the codeline, using sort()
Below, I sorted text
in decreasing order (just to show it can be done).
DT[, .(cn = paste0( cn, collapse = ">"),
test = paste0( sort( test, decreasing = TRUE ), collapse = ">" ),
test_sum = sum( test ),
test_avg = mean( test ) ),
by = cat ]
# cat cn test test_sum test_avg
# 1: 1 a>a>b 3>2>1 6 2
# 2: 2 c>b 4>0 4 2