Search code examples
rdata.tablepivot-tablelapplysubtotal

Pivoting data.table where order matters for some functions and not for some functions?


I am practising in pivoting data.table, example below with lapply to get substructure features and their corresponding columns where ordering matters for the key column, this could be seen as pivoting in spreadsheet tools. This is similar to pivot-table and perhaps R tool called cube, other subtotal tools may be easier to use than lapply. I am open to comparisons to more convenient tools, lapply seems to become quite terse when you need more functions and particularly with different orders.

Demo

I form new subcategories with paste0 in data.table over which I want to calculate features, such as mean and sum. Small example is below with input and intended output.

Input

> data.table(cn=c('a','a','b','b','c'),test=c(1,2,3,4,0),cat=c(1,1,1,2,2))
   cn test cat
1:  a    1   1
2:  a    2   1
3:  b    3   1
4:  b    4   2
5:  c    0   2
> myData <- data.table(cn=c('a','a','b','b','c'),test=c(1,2,3,4,0),cat=c(1,1,1,2,2))
> myData[order(test), lapply(.SD, paste0, collapse = '>'), by = c('cat')]  
   cat    cn  test
1:   2   c>b   0>4
2:   1 a>a>b 1>2>3  

#How can I add here new functions to get new columns for the segments such as a>a>b

Output

   cat    cn  test  test_sum test_avg
1:   1 a>a>b 1>2>3  6        2
2:   2   c>b   0>4  4        2

Put more simply, how can we get the output with the input? Is lapply the correct choice here? Notice that ordering matters for some functions and not for some functions.

How to pivot data.table properly where some functions needs order and some not?


Solution

  • something like this?

    #set keys for ordering
    setkey(DT, cat, test )
    #
    DT[, .(cn       = paste0( cn, collapse = ">"),
           test     = paste0( test, collapse = ">" ),
           test_sum = sum( test ),
           test_avg = mean( test ) ), 
       by = cat ]
    
    #    cat    cn  test test_sum test_avg
    # 1:   1 a>a>b 1>2>3        6        2
    # 2:   2   c>b   0>4        4        2
    

    But, you can also sort inside the codeline, using sort()
    Below, I sorted text in decreasing order (just to show it can be done).

    DT[, .(cn       = paste0( cn, collapse = ">"),
           test     = paste0( sort( test, decreasing = TRUE ), collapse = ">" ),
           test_sum = sum( test ),
           test_avg = mean( test ) ), 
       by = cat ]
    
    #    cat    cn  test test_sum test_avg
    # 1:   1 a>a>b 3>2>1        6        2
    # 2:   2   c>b   4>0        4        2