long time listener and first time caller here
For a project I am working on, I often end up graphing the same graphs with just different response variables. So I am trying to write a function based on a ddply()
code and a ggplot()
code I keep reusing:
(df.smpl is the dataframe I am working with, genotype is the treatment I am interested in, and var is the stand-in for a response variable I am interested in)
const.gra<-function(var){
## First, summarise the data to be used in subsequent ggplot code
summ<-ddply(df.smpl, "genotype", summarise,
N = length(var),
mean = mean(var),
sd = sd(var),
se = sd/sqrt(N))
# Now graph
ggplot(data=summ, aes(genotype, mean))+
geom_col(position = "dodge")+
geom_errorbar(aes(ymin=mean-se, ymax=mean+se),
width=.2,
position=position_dodge(.9))+
scale_x_discrete(name = "Genotype",
breaks=c("K","PW", "AW"),
labels=c("Plant K", "Plant PW", "Plant AW"))+
scale_y_continuous(name = "Title")+
theme(legend.position = "none",
legend.justification = c(1,1),
panel.background = element_rect(fill = "white"),
legend.key = element_rect(fill = "white"),
axis.line = element_line(colour = "black"),
axis.ticks.x = element_blank(),
axis.text = element_text(size = 14),
axis.title = element_text(size = 14),
legend.text = element_text (size = 14),
legend.title = element_text (size = 14))
}
const.gra(df.smpl$bgbm..mg.)
But the above codes yield the following error messages.
Error in as.double(x) : cannot coerce type 'closure' to vector of type 'double' In addition: Warning message: In mean.default(var) : argument is not numeric or logical: returning NA
Tried solving it on my own but have been very unsuccessful so far. The codes run just fine if I were to run them verbatim outside of the function.
Based on some answers I have found online re: the error code, I tried subbing out some strings that sounded like they could be common base r function names or something, but no luck thus far... :(
There are a few things to unpack here.
First, the error messages are due to sd(var)
and mean(var)
. At some point in the plyr::summarise
call, R looks for a column called var
in your data frame, and after not finding one, it looks in the parent environment from where you're calling const.gra
. There it finds the var
function in the stats
package that is loaded by default in R, and then passes it to functions that don't like other functions as their argument.
The second thing to note is that the plyr
package is retired and the developer's repo recommends dplyr
be used instead.
Based on some quick experiments I did now, I don't think plyr
supports the current non-standard evaluation syntax that is available in tidyverse
packages. Luckily, there seems to be enough compatibility between both, that you can use dplyr::summarise
inside the plyr::ddply
call and things will work without changing too much code.
That said, I would advise you drop plyr
completely. Below you can find both ways of doing things. Be aware that if you load dplyr
first and then plyr
, then the former's summarise
will be masked by the latter.
library(plyr)
library(dplyr)
func_nse <- function(y, x) {
ddply(y, "vs", summarise,
N = length({{x}}),
mean = mean({{x}}),
sd = sd({{x}}),
se = sd/sqrt(N))
}
func_dplyr <- function(y, x) {
y %>%
group_by(vs) %>%
summarise(N = length({{x}}),
mean = mean({{x}}),
sd = sd({{x}}),
se = sd/sqrt(N))
}
func_nse(mtcars, mpg)
func_dplyr(mtcars, mpg)