Search code examples
rdplyrmagrittrbroom

dot notation in magrittr pipe not be


I thought I understood that in conjunction with the magrittr pipe, the dot-notation indicates where the dataset that is piped into a function should go for evaluation. When I was starting to work with purrr/broom to generate some nested dataframes with the linear models I was generating by group I ran into a problem. When using the dot notation it seems that my prior group_by command was being ignored. Took me a while to figure out that I should simply omit the dot-notation and it works like expected, but I would like to understand why it is not working.

Here is the sample code that I expected to generate identical data, but only the first example is generating linear models by group, while the second generates the model for the whole dataset, but then still stores it at the group level.

#// library and data prep
library(tidyverse)
library(broom)
data <- as_tibble(mtcars)

#// generates lm fit for the model by group
data %>%
    #// group by factor
    group_by(carb) %>%
    #// summary for the grouped dataset
    summarize(new = list( tidy( lm(formula = drat ~ mpg)))) %>%
    #// unnest
    unnest(cols = new) 
#> Warning in summary.lm(x): essentially perfect fit: summary may be unreliable
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 12 x 6
#>     carb term         estimate  std.error  statistic    p.value
#>    <dbl> <chr>           <dbl>      <dbl>      <dbl>      <dbl>
#>  1     1 (Intercept)  1.72e+ 0   5.85e- 1   2.94e+ 0   3.24e- 2
#>  2     1 mpg          7.75e- 2   2.26e- 2   3.44e+ 0   1.85e- 2
#>  3     2 (Intercept)  1.44e+ 0   5.87e- 1   2.46e+ 0   3.95e- 2
#>  4     2 mpg          1.01e- 1   2.55e- 2   3.95e+ 0   4.26e- 3
#>  5     3 (Intercept)  3.07e+ 0   6.86e-15   4.48e+14   1.42e-15
#>  6     3 mpg          3.46e-17   4.20e-16   8.25e- 2   9.48e- 1
#>  7     4 (Intercept)  2.18e+ 0   4.29e- 1   5.07e+ 0   9.65e- 4
#>  8     4 mpg          8.99e- 2   2.65e- 2   3.39e+ 0   9.43e- 3
#>  9     6 (Intercept)  3.62e+ 0 NaN        NaN        NaN       
#> 10     6 mpg         NA         NA         NA         NA       
#> 11     8 (Intercept)  3.54e+ 0 NaN        NaN        NaN       
#> 12     8 mpg         NA         NA         NA         NA

#// generates lm fit for the whole model
data %>%
    #// group by factor
    group_by(carb) %>%
    #// summary for the whole dataset
    summarize(new = list( tidy( lm(formula = drat ~ mpg, data = .)))) %>%
    #// unnest
    unnest(cols = new)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 12 x 6
#>     carb term        estimate std.error statistic  p.value
#>    <dbl> <chr>          <dbl>     <dbl>     <dbl>    <dbl>
#>  1     1 (Intercept)   2.38      0.248       9.59 1.20e-10
#>  2     1 mpg           0.0604    0.0119      5.10 1.78e- 5
#>  3     2 (Intercept)   2.38      0.248       9.59 1.20e-10
#>  4     2 mpg           0.0604    0.0119      5.10 1.78e- 5
#>  5     3 (Intercept)   2.38      0.248       9.59 1.20e-10
#>  6     3 mpg           0.0604    0.0119      5.10 1.78e- 5
#>  7     4 (Intercept)   2.38      0.248       9.59 1.20e-10
#>  8     4 mpg           0.0604    0.0119      5.10 1.78e- 5
#>  9     6 (Intercept)   2.38      0.248       9.59 1.20e-10
#> 10     6 mpg           0.0604    0.0119      5.10 1.78e- 5
#> 11     8 (Intercept)   2.38      0.248       9.59 1.20e-10
#> 12     8 mpg           0.0604    0.0119      5.10 1.78e- 5

Created on 2021-01-04 by the reprex package (v0.3.0)


Solution

  • . in this case refers to data which is present in the previous step which is (data %>% group_by(carb)). Although the data is grouped it is still complete data. If you are on dplyr > 1.0.0 you could use cur_data() to refer to the data in the group.

    library(dplyr)
    library(broom)
    library(tidyr)
    
    data %>%
      group_by(carb) %>%
      summarize(new = list(tidy(lm(formula = drat ~ mpg, data = cur_data())))) %>%
      unnest(cols = new)
    

    This gives the same output as your first example.