Search code examples
rdplyrenvironment

Passing the attached data frame to a function


I am working on a function that combines information about a particular variable with some basic information about the data frame it comes from. Here is an example of what I'm talking about:

fcn <- function(var,data) {
  return(ncol(data)*mean(var))
}

df <- data.frame(a=1:10,b=1:10)

df %>% dplyr::mutate(c=fcn(a,df))

This works fine! However, it would be really neat if, in cases where the function is used with with or inside a dplyr verb, I can just nab the data frame/tibble object without it being explicitly passed. So ideally something like

fcn <- function(var,data=attached_data_object) {
  return(ncol(data)*mean(var))
}

df <- data.frame(a=1:10,b=1:10)

df %>% dplyr::mutate(c=fcn(a))

I've been reading up on the various environment functions - seems like I should be able to reach into the environment that with/dplyr creates from the data frame and pluck the whole thing out wholesale. As of yet I have been unable to figure out how to make this happen. Any tips appreciated! Thank you.


Solution

  • (With apologies to Hadley if I get terms slightly wrong). You might find the chapters on Environments and NSE (non-standard evaluation) from Advanced R useful.

    Within dplyr verbs, such as mutate, the dataframe/tibble being manipulated is called ".". Hence the "." in another answer here to refer to the dataframe. The dplyr verbs automatically look in "." for the specified column name. When you call a function from within mutate(), as you are doing here, you are wanting to access this object called "." that lives in the execution environment of your function. So how do we do that?

    fcn <- function(var) {
      dat <- get(".", env=parent.frame())
      return(ncol(dat) * mean(var))
    }
    
    notacol <- 8
    df <- data.frame(a=1:10, b=seq(10, 100, 10))
    df
        a   b
    1   1  10
    2   2  20
    3   3  30
    4   4  40
    5   5  50
    6   6  60
    7   7  70
    8   8  80
    9   9  90
    10 10 100
    
    
    df %>% mutate(c = fcn(a), d = fcn(b), e = fcn(notacol))
        a   b  c   d  e
    1   1  10 11 110 16
    2   2  20 11 110 16
    3   3  30 11 110 16
    4   4  40 11 110 16
    5   5  50 11 110 16
    6   6  60 11 110 16
    7   7  70 11 110 16
    8   8  80 11 110 16
    9   9  90 11 110 16
    10 10 100 11 110 16
    

    I think this is the behaviour you were after. Note that notacol isn't found in the execution environment as it isn't in the dataframe, but the Global Env is on the search path so it's found there.