Search code examples
rggplot2visualizationdistribution

How to pass parameters in stat_function() ggplot2


Lets say I have the following code:

get_histogram <- function(data_set, column_name, bin_width, attribute_name) {
  ggplot(data_set, aes(x= {{column_name}})) +
    geom_histogram(aes(y= ..density..), binwidth = .3, fill= "lightblue", colour="black") +
    xlab(paste0(attribute_name)) +
    stat_function(fun = dnorm , args= list(mean= mean({{column_name}}), sd= sd({{column_name}})), 
                  mapping = aes(colour = "Normal"))+
    stat_function(fun = dlnorm, args = list(meanlog= mean(log({{column_name}})), sdlog= sd(log({{column_name}}))),
                  mapping = aes(colour = "LogNormal")) + 
    scale_colour_manual("Distribution", values = c("red", "blue"))
}

data <- rnorm(1000, 44, 2)
df <- data.frame(data)
get_histogram(df, data, 1, "Test")

This works fine, right? So, we know the code is correct.

Now, the problem arises when I run the same code on a dataset that I cannot share here. When I run the first chunk of the code on it, the code works perfectly. I mean when I execute the following part:

get_histogram_chucnk <- function(data_set, column_name, bin_width, attribute_name) {
  ggplot(data_set, aes(x= {{column_name}})) +
    geom_histogram(aes(y= ..density..), binwidth = .3, fill= "lightblue", colour="black") +
    xlab(paste0(attribute_name)) 
}
get_histogram_chunk(new_df, columnX , 1, "Test")

As you can see here, I am passing new_df and columnX, which are coming from my real dataset (as I mentioned earlier, unfortunately, I cannot share my data). Once again, this chunk of code works just fine. However, when I run the entire code, I encounter the error 'object 'columnX' not found.' While debugging, I found that the error occurs in the following line...

stat_function(fun = dnorm , args= list(mean= mean({{column_name}}), sd= sd({{column_name}}))

The issue is occurring with the {{column}} name. I have checked everything and the fact that this code runs fine on the first chunk of code makes me confident that the issue is not with the dataset, but rather with the stat_function().

This code also works fine when I plot the graph outside of a function and pass the column name and data frame as hardcoded values. Therefore, I am sure that stat_function() doesn't like how I'm passing the column name. However, if that is indeed the case, why does it work with the sample data?

I understand that it's difficult to provide a precise answer without actual data, but any guidance would be greatly appreciated. PS the minimum value of columnX is 50 so the logNorm should work fine.


Solution

  • If you change the places where you use something like mean({{column_name}}) to something like mean(data_set[[colname]]) and use colname = as_label(enquo(column_name)) at the beginning of the code, it should work. Also, note ..density.. has been replaced with after_stat(density).

    library(ggplot2)
    get_histogram <- function(data_set, column_name, bin_width, attribute_name) {
      require(rlang)
      colname <- as_label(enquo(column_name))
      ggplot(data_set, aes(x= {{ column_name }})) +
        geom_histogram(aes(y= after_stat(density)), binwidth = .3, fill= "lightblue", colour="black") +
        xlab(paste0(attribute_name)) +
        stat_function(fun = dnorm , args= list(mean= mean(data_set[[colname]]), sd= sd(data_set[[colname]])), 
                      mapping = aes(colour = "Normal"))+
        stat_function(fun = dlnorm, args = list(meanlog= mean(log(data_set[[colname]])), sdlog= sd(log(data_set[[colname]]))),
                      mapping = aes(colour = "LogNormal")) + 
        scale_colour_manual("Distribution", values = c("red", "blue"))
    }
    
    data <- rnorm(1000, 44, 2)
    df <- data.frame(data)
    get_histogram(df, data, 1, "Test")
    #> Loading required package: rlang
    

    get_histogram(mtcars, mpg, 1, "MPG")
    

    Created on 2023-08-11 with reprex v2.0.2