Search code examples
rdplyrmagrittr

Meaning of error using . shorthand inside dplyr function


I'm getting a dplyr::bind_rows error. It's a very trivial problem, because I can easily get around it, but I'd like to understand the meaning of the error message.

I have the following data of some population groups for New England states, and I'd like to bind on a copy of these same values with the name changed to "New England," so that I can group by name and add them up, giving me values for the individual states, plus an overall value for the region.

df <- structure(list(name = c("CT", "MA", "ME", "NH", "RI", "VT"), 
        estimate = c(501074, 1057316, 47369, 76630, 141206, 27464)),
        class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))

I'm doing this as part of a much larger flow of piped steps, so I can't just do bind_rows(df, df %>% mutate(name = "New England")). dplyr gives the convenient . shorthand for a data frame being piped from one function to the next, but I can't use that to bind the data frame to itself in a way I'd like.

What does work and gets me the output I want:

library(tidyverse)

df %>%
  # arbitrary piped operation
  mutate(name = str_to_lower(name)) %>%
  bind_rows(mutate(., name = "New England")) %>%
  group_by(name) %>%
  summarise(estimate = sum(estimate))
#> # A tibble: 7 x 2
#>   name        estimate
#>   <chr>          <dbl>
#> 1 ct            501074
#> 2 ma           1057316
#> 3 me             47369
#> 4 New England  1851059
#> 5 nh             76630
#> 6 ri            141206
#> 7 vt             27464

But when I try to do the same thing with the . shorthand, I get this error:

df %>%
  mutate(name = str_to_lower(name)) %>%
  bind_rows(. %>% mutate(name = "New England"))
#> Error in bind_rows_(x, .id): Argument 2 must be a data frame or a named atomic vector, not a fseq/function

Like I said, doing it the first way is fine, but I'd like to understand the error because I write a lot of multi-step piped code.


Solution

  • As @aosmith noted in the comments it's due to the way magrittr parses the dot in this case :

    from ?'%>%':

    Using the dot-place holder as lhs

    When the dot is used as lhs, the result will be a functional sequence, i.e. a function which applies the entire chain of right-hand sides in turn to its input.

    To avoid triggering this, any modification of the expression on the lhs will do:

    df %>%
      mutate(name = str_to_lower(name)) %>%
      bind_rows((.) %>% mutate(name = "New England"))
    
    df %>%
      mutate(name = str_to_lower(name)) %>%
      bind_rows({.} %>% mutate(name = "New England"))
    
    df %>%
      mutate(name = str_to_lower(name)) %>%
      bind_rows(identity(.) %>% mutate(name = "New England"))
    

    Here's a suggestion that avoid the problem altogether:

    df %>%
      # arbitrary piped operation
      mutate(name = str_to_lower(name)) %>%
      replicate(2,.,simplify = FALSE) %>%
      map_at(2,mutate_at,"name",~"New England") %>%
      bind_rows
    
    # # A tibble: 12 x 2
    #    name        estimate
    #    <chr>          <dbl>
    #  1 ct            501074
    #  2 ma           1057316
    #  3 me             47369
    #  4 nh             76630
    #  5 ri            141206
    #  6 vt             27464
    #  7 New England   501074
    #  8 New England  1057316
    #  9 New England    47369
    # 10 New England    76630
    # 11 New England   141206
    # 12 New England    27464