Search code examples
rpurrrmethod-chaining

Sum of a list in R data frame


I have a column of type "list" in my data frame, I want to create a column with the sum.

my list column

I guess there is no visual difference, but my column consists of list(1,2,3)s and not c(1,2,3)s :

tibble(
  MY_DATA = list(
    list(2, 7, 8),
    list(3, 10, 11),
    list(4, 2, 8)
  ),
  NOT_MY_DATA = list(
    c(2, 7, 8),
    c(3, 10, 11),
    c(4, 2, 8)
  )    
)

enter image description here

Unfortunately when I try mutate(NEW_COL = MY_LIST_COL_D %>% unlist() %>% sum()) the result is that every cell in the new column contains the sum of the entire source column (so a value in the millions)

I tried reduce and it did work, but was slow, I was looking for a better solution.


Solution

  • You could use the purrr::map_dbl, which should return a vector of type double:

    library(tibble)
    library(dplyr)
    library(purrr)
    df = tibble(
      MY_LIST_COL_D = list(
        c(2, 7, 8),
        c(3, 10, 11),
        c(4, 2, 8)
      )
    )
    
    df %>% 
      mutate(NEW_COL= map_dbl(MY_LIST_COL_D, sum), .keep = 'unused')
    #   NEW_COL
        <dbl>
    # 1      17
    # 2      24
    # 3      14
    

    Is this what you were looking for? If you don't want to remove the list column just disregard the .keep argument.

    Update With the underlying structure being lists, you can still apply the same logic, but one way to solve the issue is to unlist:

    df = tibble(
      MY_LIST_COL_D = list(
        list(2, 7, 8),
        list(3, 10, 11),
        list(4, 2, 8)
      )
    )
    
    df %>% 
      mutate(NEW_COL = map_dbl(MY_LIST_COL_D, ~ sum(unlist(.x))), .keep = 'unused')
    #   NEW_COL
    #     <dbl>
    # 1      17
    # 2      24
    # 3      14