I'm trying to plot the probability density of research regarding agreements of representative pairs in Congress. Think: how does the distribution of agreement on roll call votes within same-party pairs compare to ones within cross-party pairs. For this, I created distinct pairs for each member in Congress and tracked their votes for all roll calls. I then aggregate this and determine what the proportion of agreements for each pair is. You can imagine that same-party pairs have a high proportion of agreement on roll calls votes, while cross-party-pairs don't. Hence, the maximum value this field can assume is 1, meaning that a pair has agreed on all roll call votes. The minimum value, vice versa, is 0. I do this for every imaginable pair and plot their probability density.
Unfortunately, for some congresses, I run into an issue where the line to the right of the plot won't reach the x-axis.
h117$pdplot <-
ggplot(data =
h117$pairs_votes_proportions %>%
filter(proportion_of_agreements < 1 & proportion_of_agreements > 0.00),
aes(
x = proportion_of_agreements,
fill = pair_type)) +
geom_density(adjust = 2,
alpha = 0.4,
size = 0.7) +
scale_fill_grey(start = 0.1, end = 0.8) +
labs(title = paste("House of Representatives 117;",length(unique(h117$votes$rollnumber)),"roll calls"),
x = "Proportion of Agreements",
y = "Density") +
theme_minimal() +
theme(legend.position = "none",
plot.title = element_text(hjust = 0.5, face = "bold")) +
xlim(0, 1)
Now, what I don't understand is, why the line wants to go beyond x=1 at all. Isn't it supposed to stay within the range of x? Secondly, how do I fix this in ggplot or another alternative?
Attempted: Adjusting alpha, trim, filtering out the edges of data or leaving them in, switching to stat_density
There’s no rule that a density function has to intercept the x axis, and nothing necessarily wrong with your plot. In fact, if the line did intercept the x axis at x = 1, this would be an inaccurate representation of your data. This would be communicating that the probability density is 0 when x = 1 — essentially, that there are no cases where the proportion of agreement is 1. But in fact there are some cases where the proportion of agreement is 1; hence the probability density at x = 1 is > 0, so the line is necessarily above the x axis at that point.
So I would leave your plot as is. That being said, you can add an outline around the density function using outline.type = "full"
.
set.seed(13)
library(ggplot2)
# example data
dat <- data.frame(
x = c(rbeta(250, 3, 7), rbeta(250, 10, 1)),
grp = rep(c("a", "b"), each = 250)
)
ggplot(dat) +
geom_density(aes(x, fill = grp), alpha = 0.4, outline.type = "full") +
scale_fill_grey(start = 0.1, end = 0.8) +
theme_minimal() +
xlim(0, 1)
To my eye, this is misleading — it looks like group b has a lot of cases with x very close to 1, but none where x equals 1, which isn’t accurate.
A compromise might be to add a border to the plot rather than to the density shapes:
ggplot(dat) +
geom_density(aes(x, fill = grp), alpha = 0.4) +
scale_x_continuous(limits = c(0, 1), expand = c(0, 0)) +
scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
scale_fill_grey(start = 0.1, end = 0.8) +
theme_minimal() +
theme(panel.border = element_rect(linewidth = 1, fill = NA))
To me at least, this makes the end of the density function look a bit less abrupt, without implying there are 0 cases where x = 1.