Search code examples
rdplyrpipemagrittr

R combinations with dot ("."), "~", and pipe (%>%) operator


I have been looking to a lot of answers and still I can't completely understand them. For example, the clearest one (here), among others (1,2,3) gives specific examples about the various uses of the dot but I cannot understand, for example, its application here:

car_data <- 
  mtcars %>%
  subset(hp > 100) %>%
  aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>%
  transform(kpl = mpg %>% multiply_by(0.4251)) %>%
  print

#result:
  cyl   mpg  disp    hp drat   wt  qsec   vs   am gear carb    kpl
1   4 25.90 108.0 111.0 3.94 2.15 17.75 1.00 1.00 4.50 2.00 11.010
2   6 19.74 183.3 122.3 3.59 3.12 17.98 0.57 0.43 3.86 3.43  8.391
3   8 15.10 353.1 209.2 3.23 4.00 16.77 0.00 0.14 3.29 3.50  6.419

The code above is from an explanation for %>% in magrittr, where I'm trying to understand the pipe operator also (I know that it gives you the result of the previous computation, but I get lost in the aggregate code line when it mixes ., and %>% inside the same function.

So, I can't understand what does the code above. I have the result (I put it above). But I don't get how it reach that result, specially the aggregate code line, where it uses the dot and the ~ sign. I know that ~ means "all other variables", but what it means with the dot? It has another meaning or application? And what does the pipe operator inside a specific function?


Solution

  • That line uses the . in three different ways.

             [1]             [2]      [3]
    aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))
    

    Generally speaking you pass in the value from the pipe into your function at a specific location with . but there are some exceptions. One exception is when the . is in a formula. The ~ is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example

    aggregate(. ~ cyl, data=mydata)
    

    And that's just because aggregate requires a formula with both a left and right hand side. So the . at [1] just means "all the other columns in the dataset." This use is not at all related to magrittr.

    The . at [2] is the value that's being passed in as the pipe. If you have a plain . as a parameter to the function, that's there the value will be placed. So the result of the subset() will go to the data= parameter.

    The magrittr library also allows you to define anonymous functions with the . variable. If you have a chain that starts with a ., it's treated like a function. so

    . %>% mean %>% round(2)
    

    is the same as

    function(x) round(mean(x), 2)
    

    so you're just creating a custom function with the . at [3]