I would have a question about the density plot created with ggplot2
. When plotting the axis of a variable with multiple levels, should that axis return the ordered sequence from the level with higher down to lower frequency?
I am not sure about the representation I got here:
This is the dataset:
data = data.frame(cos = c(rep('5', 308), rep('3', 199), rep('0', 184), rep('2', 9)),
mag = c('Yes', 'No'))
this is the way I have tried to sort and order variable to plot on x axis (cos)
library(data.table)
data = setDT(data)[, freq := .N, by = .(cos)][order(-freq)]
and here the codde for the plot
ggplot(data) +
geom_density(aes(x= cos, fill = mag), alpha=0.4) +
labs(title="Density curve",x="cos", y = "mag") +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank()) +
theme(axis.ticks.y=element_blank(), axis.ticks.x=element_blank())
Shouldn't be 5 the first point on x axis
, due to higher frequency?
What you are describing is not a density plot at all, since the x axis is discrete. A density plot shows the estimated probability distribution function along a continuous variable, so with a discrete axis, the value of a "density" curve anywhere other than the peak over each discrete x axis tick is meaningless (and potentially misleading).
From the comments, what you are looking for is effectively just a bar plot, but using bell curves rather than rectangles for the "bars".
I think the best way to do this sort of niche visualization is to work out exactly what you want to plot, wrangle the data into the correct format, then draw it with simple geoms.
It will be easier to use a continuous axis and fake-label it with the discrete levels afterwards. The wrangling might look something like this:
library(tidyverse)
df <- data %>%
count(cos, mag) %>%
mutate(cos = reorder(cos, -n)) %>%
group_by(cos, mag) %>%
summarise(x = seq(0, 5, 0.01),
y = n * dnorm(x, as.numeric(cos), sd = 0.2) /
dnorm(as.numeric(cos), as.numeric(cos), sd = 0.2))
And the plotting code, if you want the "Yes" and "No" values overlaid, would be:
ggplot(df, aes(x = x, y = y, fill = mag, group = interaction(cos, mag))) +
geom_area(position = "identity", color = "black", alpha = 0.5) +
scale_x_continuous("cos", breaks = 1:4, labels = levels(df$cos)) +
labs(y = "Count") +
scale_fill_manual(values = c("orange", "deepskyblue4")) +
theme_minimal(base_size = 20)
If you instead want them stacked, you could do:
ggplot(df, aes(x = x, y = y, fill = mag, group = interaction(cos, mag))) +
lapply(split(df, df$cos), function(x) {
geom_area(position = "stack", color = "black", alpha = 0.5, data = x)
})+
scale_x_continuous("cos", breaks = 1:4, labels = levels(df$cos)) +
labs(y = "Count") +
scale_fill_manual(values = c("orange", "deepskyblue4")) +
theme_minimal(base_size = 20)
If you want them dodged, you would need to wrangle the data a little differently:
df <- data %>%
count(cos, mag) %>%
mutate(cos = reorder(cos, -n)) %>%
group_by(cos, mag) %>%
summarise(x = seq(0, 5, 0.01),
y = n * dnorm(x, as.numeric(cos) + ifelse(mag == "Yes", -0.1, 0.1),
sd = 0.2) /
dnorm(as.numeric(cos) + ifelse(mag == "Yes", -0.1, 0.1),
as.numeric(cos) + ifelse(mag == "Yes", -0.1, 0.1), sd = 0.2))
ggplot(df, aes(x = x, y = y, fill = mag, group = interaction(cos, mag))) +
geom_area(position = "identity", color = "black", alpha = 0.5) +
scale_x_continuous("cos", breaks = 1:4, labels = levels(df$cos)) +
labs(y = "Count") +
scale_fill_manual(values = c("orange", "deepskyblue4")) +
theme_minimal(base_size = 20)