I would like to create with ggplot2 a barplot with SDM from a set of data ($ proteinN in Y and $ method in X) and include in the same barplot (overlapped) with an indicator in the legend another set of data ($ specific) with the shape of a bullet bar chart. Something a little bit like this (but vertical bars and the SDM for the first set of data)
(source: yaksis.com)
Here is my code and data:
library(ggplot2)
data <- textConnection("proteinN, supp, method, specific
293, protnumb, insol, 46
259, protnumb, insol, 46
274, protnumb, insol, 46
359, protnumb, fasp, 49
373, protnumb, fasp, 49
388, protnumb, fasp, 49
373, protnumb, efasp, 62
384, protnumb, efasp, 62
382, protnumb, efasp, 62
")
data <- read.csv(data, h=T)
# create functions to get the lower and upper bounds of the error bars
stderr <- function(x){sqrt(var(x,na.rm=TRUE)/length(na.omit(x)))}
lowsd <- function(x){return(mean(x)-stderr(x))}
highsd <- function(x){return(mean(x)+stderr(x))}
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73",
"#F0E442", "#0072B2", "#D55E00", "#CC79A7")
# create a ggplot
ggplot(data=data,aes(x=method, y=proteinN, fill=method))+
#Change _hue by _manualand remove c=45, l=80 if not desire#
scale_fill_manual(values=cbPalette)+
scale_fill_hue(c=45, l=80)+
# first layer is barplot with means
stat_summary(fun.y=mean, geom="bar", position="dodge", colour='black')+
# second layer overlays the error bars using the functions defined above
stat_summary(fun.y=mean, fun.ymin=lowsd, fun.ymax=highsd,
geom="errorbar", position="dodge",color = 'black', size=.5)
I did try few things but nothing work and when I try to add the second set of data I always got this error output:
Error : Mapping a variable to y and also using stat="bin". With stat="bin", it will attempt to set the y value to the count of cases in each group. This can result in unexpected behavior and will not be allowed in a future version of ggplot2. If you want y to represent counts of cases, use stat="bin" and don't map a variable to y. If you want y to represent values in the data, use stat="identity". See ?geom_bar for examples. (Defunct; last used in version 0.9.2)
Error : Mapping a variable to y and also using stat="bin". With stat="bin", it will attempt to set the y value to the count of cases in each group. This can result in unexpected behavior and will not be allowed in a future version of ggplot2. If you want y to represent counts of cases, use stat="bin" and don't map a variable to y. If you want y to represent values in the data, use stat="identity". See ?geom_bar for examples. (Defunct; last used in version 0.9.2)
Here is my try:
# create functions to get the lower and upper bounds of the error bars
stderr <- function(x){sqrt(var(x,na.rm=TRUE)/length(na.omit(x)))}
lowsd <- function(x){return(mean(x)-stderr(x))}
highsd <- function(x){return(mean(x)+stderr(x))}
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73",
"#F0E442", "#0072B2", "#D55E00", "#CC79A7")
# create a ggplot
ggplot(data=data,aes(x=method, y=proteinN, fill=method, witdh=1))+
#Change _hue by _manualand remove c=45, l=80 if not desire#
scale_fill_manual(values=cbPalette)+
scale_fill_hue(c=45, l=80)+
#Second set of data#
geom_bar(aes(x=method, y=specific, fill="light green"), width=.4) +
# first layer is barplot with means
stat_summary(fun.y=mean, geom="bar", position="dodge", colour='black')+
# second layer overlays the error bars using the functions defined above
stat_summary(fun.y=mean, fun.ymin=lowsd, fun.ymax=highsd,
geom="errorbar", position="dodge",color = 'black', size=.5)
Maybe try something like this?
ggplot(data=data,aes(x=method, y=proteinN, fill=method, width=1))+
scale_fill_hue(c=45, l=80) +
stat_summary(fun.y=mean, geom="bar", position="dodge", colour='black')+
stat_summary(fun.y=mean, fun.ymin=lowsd, fun.ymax=highsd,
geom="errorbar", position="dodge",color = 'black', size=.5) +
geom_bar(data = unique(data[,c('method','specific')]),
aes(x = method,y = specific),
stat = "identity",
fill = "light green",
width = 0.5)
A couple of notes.
You misspelled "width".
Your two scale_fill
lines are pointless. ggplot
will only take one fill scale, whichever one appears last. You can't "modify" the fill scale like that. You ought to have received a warning about it that explicitly said:
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the existing scale.
The error message you got said:
Mapping a variable to y and also using stat="bin"
i.e. you specified y = proteinN
while also using stat = "bin"
in geom_bar
(the default). It went on to explain:
With stat="bin", it will attempt to set the y value to the count of cases in each group.
i.e. rather than plot the values in y
, it will try to count the number of instances of, say, insol
, and plot that. (Three, in this case.) A cursory examination of the examples in ?geom_bar
immediately reveals that most of the examples only specify an x variable. Until you get to this example in the help:
# When the data contains y values in a column, use stat="identity"
library(plyr)
# Calculate the mean mpg for each level of cyl
mm <- ddply(mtcars, "cyl", summarise, mmpg = mean(mpg))
ggplot(mm, aes(x = factor(cyl), y = mmpg)) + geom_bar(stat = "identity")
where it demonstrates that when you specify the precise y
values you want, you have to also say stat = "identity"
. Conveniently, the error message also said this:
If you want y to represent values in the data, use stat="identity".
The final piece was knowing that since the overlaid bars only have one value per x value, we should really collapse that piece down to the minimum information needed via:
unique(data[,c('method','specific')]
or just split it off into it's own data frame ahead of time.