Search code examples
rdplyrtidyverseapplypurrr

Unable to access nested data elements inside mutate


I am trying to understand why the following code doesn't work. My understanding is it will take data$Sepal.Length (element within the nested data column) and iterate that one(the vector) over the function sum.

df <- iris %>%
    nest(-Species) %>%
    mutate(Total.Sepal.Length = map_dbl(data$Sepal.Length, sum, na.rm = TRUE))
print(df)

But this throws an error Total.Sepal.Length must be size 3 or 1, not 0. The following code works by using anonymous function as how it is usually accessed

df <- iris %>%
    nest(-Species) %>%
    mutate(Total.Sepal.Length = map_dbl(data, function(x) sum(x$Sepal.Length, na.rm = TRUE)))
print(df)

I am trying to understand why the previous code didn't work even though I am correctly passing arguments to mutate and map.


Solution

  • You should do this:

    df <- iris %>%
      nest(-Species) %>%
      mutate(Total.Sepal.Length = map_dbl(data, ~sum(.x$Sepal.Length, na.rm = TRUE)))
    

    Two things: any reason you're not using group_by?

    Second: your initial mutate is trying to perform:

     map_dbl(df$data$Sepal.Length, sum, na.rm = TRUE)
    

    Which brings an empty result, because df$data$Total.Sepal.Length is NULL (you have to access each list element to access the columns, so df$data[[1]]$Total.Sepal.Length works)

    Output:

    # A tibble: 3 × 3
      Species    data              Total.Sepal.Length
      <fct>      <list>                         <dbl>
    1 setosa     <tibble [50 × 4]>               250.
    2 versicolor <tibble [50 × 4]>               297.
    3 virginica  <tibble [50 × 4]>               329.