I am interested in making a similar plot like this for iris data, with summary statistics produced on the plot: https://i.sstatic.net/Zyv2s.jpg
I am following this post over here: How to add summary statistics in histogram plot using ggplot2?
df <- iris
df.m <- melt(df, id="Species")
#Calculating the summary statistics
summ <- df.m %>%
group_by(variable) %>%
summarize(min = min(value), max = max(value),
mean = mean(value), q1= quantile(value, probs = 0.25),
median = median(value), q3= quantile(value, probs = 0.75),
sd = sd(value))
I then modified the code to make density plots instead of histograms:
p1 <- ggplot(df.m) + geom_density(aes(x = value), fill = "grey", color = "black") +
facet_wrap(~variable, scales="free", ncol = 2)+ theme_bw()
I seem to be having a problem over here:
p1+geom_density(data=summ,label =split(summ,summ$variable),
npcx = 0.00, npcy = 1, hjust = 0, vjust = 1,size=2)
Does anyone know what the problem is? Also, is it possible to accomplish this with only ggplot2? I am working with a computer where I do not have the admin privileges to download many libraries (I have reshape2, dplyr, ggplot2). Should this be done using the annotate() function in ggplot2? And is there a way to change the x-axis for each graph to "log"?
I would suggest next approach as you have only few packages. You can add summary as a text annotation but you should play around the position of the text for each groups. Also log()
transformation is possible if you apply in the aes()
for ggplot()
. I will show you two ways to do the annotations.
library(ggplot2)
library(dplyr)
#Data
df <- iris
df.m <- melt(df, id="Species")
Here, we create the annotations:
#Calculating the summary statistics and create the label
summ <- df.m %>%
group_by(variable) %>%
summarize(min = min(value), max = max(value),
mean = mean(value), q1= quantile(value, probs = 0.25),
median = median(value), q3= quantile(value, probs = 0.75),
sd = sd(value)) %>%
mutate_if(is.numeric, round, digits=2) %>%
mutate(lab = paste("min = ", min, "\nmax = ", max, "\nmean = ", mean,
"\nq1 = ", q1, "\nmedian = ", median, "\nq3 = ", q3, "\nsd = ", sd),
position=c(1.5, 0.8, 0.25, -2)) %>% select(variable, lab, position)
If you want to define the position of the labels you have to modify position
variable from previous section which determines x position. Using that the code for the plot is next:
#Plot
p1 <- ggplot(df.m) + geom_density(aes(x = log(value)), fill = "grey", color = "black") +
facet_wrap(~variable, scales="free", ncol = 2)+ theme_bw()
p1 <- p1 + geom_text(data = summ, aes(x=position, label = lab), y=Inf, hjust=1, vjust=1.2, size=3)
p1
The output:
Annotations have the x position defined in summ
. If you want to avoid it you simply use next code:
p1 <- ggplot(df.m) + geom_density(aes(x = log(value)), fill = "grey", color = "black") +
facet_wrap(~variable, scales="free", ncol = 2) + theme_bw()
p1 <- p1 + geom_text(data = summ, aes(label = lab), x = Inf, y = Inf, hjust = 1, vjust = 1.2, size = 3)
p1
The output:
You can choose any of these options. The reason why the function you applied did not work is maybe due to grid
and gridExtra
packages.