Search code examples
rrlang

How can I get data attributes from rlang's .data like I can with .?


I am building a tidy-compatible function for use inside dplyr's mutate where I'd like to pass a variable and also the data set I'm working with, and use information from both to build a vector.

As a basic example, imagine I want to return a string containing the mean of the variable and the number of rows in the data set (I know I could just take the length of var, ignore that, it's an example).

library(tidyverse)
library(rlang)

info <- function(var,df = get(".",envir = parent.frame())) {
  paste(mean(var),nrow(df),sep=', ')
}

dat <- data.frame(a = 1:10, i = c(rep(1,5),rep(2,5)))

#Works fine, 'types' contains '5.5, 10'
dat %>% mutate(types = info(a))

Ok, great so far. But now maybe I want it to work with grouped data. var will be from just one group, but . would be the full data set. So instead I'll use rlang's .data pronoun, which is just the data being worked with.

However, .data is not like .. . is the data set, but .data is just a pronoun from which I can pull variables with .data[[varname]].

info2 <- function(var,df = get(".data",envir = parent.frame())) {
  paste(mean(var),nrow(.data),sep=', ')
}

#Doesn't work. nrow(.data) gives blank strings
dat %>% group_by(i) %>% mutate(types = info2(a))

How can I get the full thing from .data? I know I didn't include it in the example but specifically I both need some stuff from attr(dat) AND some stuff from the variables in dat that is properly subsetted for the grouping, so neither reverting to . nor just pulling out variables and getting stuff from there would work.


Solution

  • As Alexis mentioned in the above comment, this is not possible, as it's not the intended use of .data. However, now that I've given up on doing this directly, I've worked up a kludge using a combination of . and .data.

    info <- function(var,df = get(".",envir = parent.frame())) {
      #First, get any information you need from .
      fulldatasize <- nrow(df)
    
      #Then, check if you actually need .data,
      #i.e. the data is grouped and you need a subsample
      if (length(var) < nrow(df)) {
          #If you are, get the list of variables you want from .data, maybe all of them
          namesiwant <- names(df)
    
          #Get .data
          datapronoun <- get('.data',envir=parent.frame())
    
          #And remake df using just the subsample
          df <- data.frame(lapply(namesiwant, function(x) datapronoun[[x]]))
          names(df) <- namesiwant
      }
    
      #Now do whatever you want with the .data data
      groupsize <- nrow(df)
    
      paste(mean(var),groupsize,fulldatasize,sep=', ')
    }
    
    dat <- data.frame(a = 1:10, i = c(rep(1,5),rep(2,5)))
    
    #types contains the within-group mean, then 5, then 10
    dat %>% group_by(i) %>% mutate(types = info(a))