Search code examples
rdplyrtidyversemagrittr

Unexpected dplyr::bind_rows() behavior


Short Version:

I'm encountering an error with dplyr::bind_rows() which I don't understand. I want to split my data based on some condition (e.g. a == 1), operate on one part (e.g. b = b * 10), and bind it back to the other part using dplyr::bind_rows() in a single pipe chain. It works fine if I provide the first input to the two parts explictly, but if instead I pipe them in with . it complains about the data type of agrument 2.

Here's a MRE of the issue:

library(tidyverse)

# sim data
d <- tibble(a = 1:4, b = 1:4)

# works when 'd' is supplied directly to bind_rows()
bind_rows(d %>% filter(a == 1),
          d %>% filter(!a == 1) %>% mutate(b = b * 10))
#> # A tibble: 4 x 2
#>       a     b
#>   <int> <dbl>
#> 1     1     1
#> 2     2    20
#> 3     3    30
#> 4     4    40


# fails when 'd' is piped in to bind_rows()
d %>%
  bind_rows(. %>% filter(a == 1),
            . %>% filter(!a == 1) %>% mutate(b = b * 10))
#> Error: Argument 2 must be a data frame or a named atomic vector.

Long Version:

If I capture what the bind_rows() call is getting as input as a list() instead, I can see that two unexpected (to me) things are happening.

  1. Instead of evaluating the pipe chains I provided it seems to just capure them as a functional sequence.
  2. I can see that the input (.) is invisibly being provided in addition to the two explict arguments, so I get 3 items instead of 2 in the list.
# capture intermediate values for diagnostics
d %>%
  list(. %>% filter(a == 1),
            . %>% filter(!a == 1) %>% mutate(b = b * 10))
#> [[1]]
#> # A tibble: 4 x 2
#>       a     b
#>   <int> <int>
#> 1     1     1
#> 2     2     2
#> 3     3     3
#> 4     4     4
#> 
#> [[2]]
#> Functional sequence with the following components:
#> 
#>  1. filter(., a == 1)
#> 
#> Use 'functions' to extract the individual functions. 
#> 
#> [[3]]
#> Functional sequence with the following components:
#> 
#>  1. filter(., !a == 1)
#>  2. mutate(., b = b * 10)
#> 
#> Use 'functions' to extract the individual functions.

This leads me to the following inelegant solution where I solve the first problem by piping to the inner function which seems to force evaluation correctly (for reasons I don't understand) and then solve the second problem by subsetting the list prior to performing the bind_rows() operation.

# hack solution to force eval and clean duplicated input
d %>%
  list(filter(., a == 1),
       filter(., !a == 1) %>% mutate(b = b * 10)) %>%
  .[-1] %>% 
  bind_rows()
#> # A tibble: 4 x 2
#>       a     b
#>   <int> <dbl>
#> 1     1     1
#> 2     2    20
#> 3     3    30
#> 4     4    40

Created on 2022-01-24 by the reprex package (v2.0.1)

It seems like it might be related to this issue, but I can't quite see how. It would be great to understand why this is happening and find a way code this without the need to assign intermediate variables or do this weird hack to subset the intermediate list.


EDIT:

Knowing this was related to curly braces ({}) enabled me to find a few more helpful links: 1, 2, 3


Solution

  • If we want to use ., then block it with scope operator ({})

    library(dplyr)
    d %>%
       {
      bind_rows({.} %>% filter(a == 1),
                {.} %>% filter(!a == 1) %>% mutate(b = b * 10))
       }
    

    -output

    # A tibble: 4 × 2
          a     b
      <int> <dbl>
    1     1     1
    2     2    20
    3     3    30
    4     4    40