Search code examples
rdplyrsparklyr

How to use R function in Sparklyr


I am researching how to use R function on line but still have hard time figuring out. Please help.

My initial code looks like:

whatever %>%
group_by(a) %>%
summarize(count=n()) %>%
collect() %>%
ggplot(aes(x=a, y=count)) +
geom_point()

I want to repeat this multiple times since there are other columns I want to check with the same function.

So I wrote:

point_dist <- function(dta, vari) {
dta %>%
group_by(vari) %>%
summarize(count=n()) %>%
collect() %>%
ggplot(aes(x=vari, y=count)) +
gemo_point()
}

point_dist(whatever, a)

but keep telling me:

Error in eval_bare(sym, env) : object 'a' not found

Don't know why.

I either don't know if this is the right direction I shall go.

Thanks again.


Solution

  • Your issue is related to non-standard evaluation that dplyr functions tend to give you. When you reference a in your first call to point_dist, R attempts to evaluate it, which of course fails. (It's even more confusing when you have some variable named as such in your calling environment or higher ...)

    NSE in dplyr means you can do something like select(mtcars, cyl), whereas with most standard-evaluation functions, you'll need myfunc(mtcars, "cyl"), since there isn't a variable named cyl in the calling environment.

    In your case, try:

    point_dist <- function(dta, vari) {
      vari <- enquo(vari)
      dta %>%
        group_by(!!vari) %>%
        summarize(count=n()) %>%
        collect() %>%
        ggplot(aes(x=!!vari, y=count)) +
        gemo_point()
    }
    

    This method of dealing with unquoted column-names in your functions can be confusing if you're familiar with normal R function definitions and/or are not familiar with NSE. This can be a good template for you if that's as far as you're going to go with it, otherwise I strongly urge you to read a little more at the first reference below.

    Some good references for NSE, specifically in/around tidyverse stuff: