Search code examples
rlistdatatablecastingmelt

Making lists by group


I have data as follows:

library(data.table)
dat <- structure(list(year2006 = c("1110", "1110", "1110", "1110", "1120", 
"1120", "1120", "1120"), group2006 = c("1", "2", "3", "4", "1", 
"2", "3", "4"), min2006 = c("1.35", "2", "3.7", 
"4.25", "5.6", "4.45", "3.09", "1.13"), 
    year2007 = c("1110", "1110", "1110", "1110", "1120", "1120", 
    "1120", "1120"), group2007 = c("1", "2", "3", "4", "1", 
    "2", "3", "4"), min2007 = c("5", "5.05", "5", 
    "1.59", "2.3", "3", "4.05", "5.16"
    )), row.names = c(NA, -8L), class = c("data.table", "data.frame"
))
dat

   year2006  group2006 min2006 year2007  group2007 min2007
1:     1110          1    1.35     1110          1       5
2:     1110          2       2     1110          2    5.05
3:     1110          3     3.7     1110          3       5
4:     1110          4    4.25     1110          4    1.59
5:     1120          1     5.6     1120          1     2.3
6:     1120          2    4.45     1120          2       3
7:     1120          3    3.09     1120          3    4.05
8:     1120          4    1.13     1120          4    5.16

What I would like to do, is to create a list of the numbers in min200x, per category in year200x.

Desired output:

   cat       year2006                         year2007
1:     1110  c("1.35", "2", "3.7", "4.25")    c("5", "5.05", "5", "1.59") 
2:     1120  c("5.6", "4.45", "3.09", "1.13") c("2.3", "3", "4.05", "5.16")

I thought I could do something like:

setDT(dat)[, cat := list(min2006), by=year2006]

But that does not work (it just puts the min2006 item in a new colum cat). And even if it did, it would only provide a solution for the year 2006. How should I go about this?


Solution

  • I'm not sure why your columns in your test data are all character but the columns in your desired output are numeric. Also, you ask for a list of numbers by group but your expected output shows a vector.

    Nevertheless, here's a tidyverse solution that creates list columns.

    library(tidyverse)
    
    x <- dat %>% 
      mutate(across(everything(), as.numeric)) %>% 
      group_by(year2006) %>% 
      select(year2006, starts_with("min")) %>% 
      summarise(across(everything(), lst))
    x
    # A tibble: 2 × 3
      year2006 min2006      min2007     
         <dbl> <named list> <named list>
    1     1110 <dbl [4]>    <dbl [4]>   
    2     1120 <dbl [4]>    <dbl [4]>
    

    and, for example,

    x$min2006
    $min2006
    [1] 1.35 2.00 3.70 4.25
    
    $min2006
    [1] 5.60 4.45 3.09 1.13
    

    If your inputs are actually numeric, you can lose the mutate.

    Edit

    ... and to get the correct name for the grouping column, you can add %>% rename(cat=year2006) to the pipe. Apologies for the omission.