I have a column of type "list" in my data frame, I want to create a column with the sum.
I guess there is no visual difference, but my column consists of list(1,2,3)
s and not c(1,2,3)
s :
tibble(
MY_DATA = list(
list(2, 7, 8),
list(3, 10, 11),
list(4, 2, 8)
),
NOT_MY_DATA = list(
c(2, 7, 8),
c(3, 10, 11),
c(4, 2, 8)
)
)
Unfortunately when I try
mutate(NEW_COL = MY_LIST_COL_D %>% unlist() %>% sum())
the result is that every cell in the new column contains the sum of the entire source column (so a value in the millions)
I tried reduce
and it did work, but was slow, I was looking for a better solution.
You could use the purrr::map_dbl
, which should return a vector of type double
:
library(tibble)
library(dplyr)
library(purrr)
df = tibble(
MY_LIST_COL_D = list(
c(2, 7, 8),
c(3, 10, 11),
c(4, 2, 8)
)
)
df %>%
mutate(NEW_COL= map_dbl(MY_LIST_COL_D, sum), .keep = 'unused')
# NEW_COL
<dbl>
# 1 17
# 2 24
# 3 14
Is this what you were looking for? If you don't want to remove the list column just disregard the .keep
argument.
Update
With the underlying structure being lists, you can still apply the same logic, but one way to solve the issue is to unlist
:
df = tibble(
MY_LIST_COL_D = list(
list(2, 7, 8),
list(3, 10, 11),
list(4, 2, 8)
)
)
df %>%
mutate(NEW_COL = map_dbl(MY_LIST_COL_D, ~ sum(unlist(.x))), .keep = 'unused')
# NEW_COL
# <dbl>
# 1 17
# 2 24
# 3 14