I want to create a histogram with data from two different conditions (A and B in the example below). I want to plot both distributions in the same plot using geom_histogram
in R.
However, it seems that for condition A, the distribution of the whole data set is shown (instead of only A).
In the example below, three cases are shown:
You will see that the distribution of A is not the same when you compare 1) and 2).
Can anyone explain why this occurs and how to fix this problem?
set.seed(5)
# Create test data frame
test <- data.frame(
condition=factor(rep(c("A", "B"), each=200)),
value =c(rnorm(200, mean=12, sd=2.5), rnorm(200, mean=13, sd=2.1))
)
# Create separate data sets
test_a <- test[test$condition == "A",]
test_b <- test[test$condition == "B",]
# 1) Plot A and B
ggplot(test, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5) +
ggtitle("Test A and AB")
# 2) Plot only A
ggplot(test_a, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5) +
ggtitle("Test A")
# 3) Plot only B
ggplot(test_b, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5) +
ggtitle("Test B")
An alternative for visualization, not to supplant MichaelDewar's answer:
ggab <- ggplot(test, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5, position = "identity") +
ggtitle("Test A and AB") +
xlim(5, 20) +
ylim(0, 13)
# 2) Plot only A
gga <- ggplot(test_a, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5) +
ggtitle("Test A") +
xlim(5, 20) +
ylim(0, 13)
# 3) Plot only B
ggb <- ggplot(test_b, aes(x=value, fill=condition)) +
geom_histogram(binwidth = 0.25, alpha=.5) +
ggtitle("Test B") +
xlim(5, 20) +
ylim(0, 13)
library(patchwork) # solely for a quick side-by-side-by-side presentation
gga + ggab + ggb + plot_annotation(title = 'position = "identity"')
The key in this visualization is adding position="identity"
to the first hist (the others do not need it).
Alternatively, one could use position="dodge"
(this is best viewed on the console, it's a bit difficult on this small snapshot).
And for perspective, position = "stack"
, the default, showing "A" with a demonstrably altered histogram.