Search code examples
rfunctionggplot2plyr

"Error in as.double(x)..." and other error message while creating a function to summarize and graph


long time listener and first time caller here For a project I am working on, I often end up graphing the same graphs with just different response variables. So I am trying to write a function based on a ddply() code and a ggplot() code I keep reusing:

(df.smpl is the dataframe I am working with, genotype is the treatment I am interested in, and var is the stand-in for a response variable I am interested in)

const.gra<-function(var){
  ## First, summarise the data to be used in subsequent ggplot code
  summ<-ddply(df.smpl, "genotype", summarise,
                        N = length(var),
                        mean = mean(var),
                        sd = sd(var),
                        se = sd/sqrt(N))
  # Now graph
  ggplot(data=summ, aes(genotype, mean))+
    geom_col(position = "dodge")+
    geom_errorbar(aes(ymin=mean-se, ymax=mean+se),
                  width=.2,
                  position=position_dodge(.9))+
    scale_x_discrete(name = "Genotype",
                     breaks=c("K","PW", "AW"),
                     labels=c("Plant K", "Plant PW", "Plant AW"))+
    scale_y_continuous(name = "Title")+
    theme(legend.position = "none", 
          legend.justification = c(1,1),
          panel.background = element_rect(fill = "white"),
          legend.key = element_rect(fill = "white"),
          axis.line = element_line(colour = "black"),
          axis.ticks.x = element_blank(),
          axis.text = element_text(size = 14),
          axis.title = element_text(size = 14),
          legend.text = element_text (size = 14),
          legend.title = element_text (size = 14))
}
const.gra(df.smpl$bgbm..mg.)

But the above codes yield the following error messages.

Error in as.double(x) : 
cannot coerce type 'closure' to vector of type 'double'
In addition: Warning message:
In mean.default(var) : argument is not numeric or logical: returning NA

Tried solving it on my own but have been very unsuccessful so far. The codes run just fine if I were to run them verbatim outside of the function.

Based on some answers I have found online re: the error code, I tried subbing out some strings that sounded like they could be common base r function names or something, but no luck thus far... :(


Solution

  • There are a few things to unpack here.

    First, the error messages are due to sd(var) and mean(var). At some point in the plyr::summarise call, R looks for a column called var in your data frame, and after not finding one, it looks in the parent environment from where you're calling const.gra. There it finds the var function in the stats package that is loaded by default in R, and then passes it to functions that don't like other functions as their argument.

    The second thing to note is that the plyr package is retired and the developer's repo recommends dplyr be used instead.

    Based on some quick experiments I did now, I don't think plyr supports the current non-standard evaluation syntax that is available in tidyverse packages. Luckily, there seems to be enough compatibility between both, that you can use dplyr::summarise inside the plyr::ddply call and things will work without changing too much code.

    That said, I would advise you drop plyr completely. Below you can find both ways of doing things. Be aware that if you load dplyr first and then plyr, then the former's summarise will be masked by the latter.

    library(plyr)
    library(dplyr)
    
    func_nse <- function(y, x) {
      ddply(y, "vs", summarise,
                     N = length({{x}}),
                     mean = mean({{x}}),
                     sd = sd({{x}}),
                     se = sd/sqrt(N))
    }
    
    func_dplyr <- function(y, x) {
      y %>%
      group_by(vs) %>%
      summarise(N = length({{x}}),
                mean = mean({{x}}),
                sd = sd({{x}}),
                se = sd/sqrt(N))
    }
    
    func_nse(mtcars, mpg)
    func_dplyr(mtcars, mpg)