Search code examples
rggplot2density-plot

Why does my plot utilizing geom_density not reach the x-axis?


I'm trying to plot the probability density of research regarding agreements of representative pairs in Congress. Think: how does the distribution of agreement on roll call votes within same-party pairs compare to ones within cross-party pairs. For this, I created distinct pairs for each member in Congress and tracked their votes for all roll calls. I then aggregate this and determine what the proportion of agreements for each pair is. You can imagine that same-party pairs have a high proportion of agreement on roll calls votes, while cross-party-pairs don't. Hence, the maximum value this field can assume is 1, meaning that a pair has agreed on all roll call votes. The minimum value, vice versa, is 0. I do this for every imaginable pair and plot their probability density.

Unfortunately, for some congresses, I run into an issue where the line to the right of the plot won't reach the x-axis.

enter image description here

 h117$pdplot <- 
  ggplot(data = 
         h117$pairs_votes_proportions %>%
         filter(proportion_of_agreements < 1 & proportion_of_agreements > 0.00), 
       aes(
         x = proportion_of_agreements,
         fill = pair_type)) +
  geom_density(adjust = 2, 
               alpha = 0.4, 
               size = 0.7) +
  scale_fill_grey(start = 0.1, end = 0.8) +
  labs(title = paste("House of Representatives 117;",length(unique(h117$votes$rollnumber)),"roll calls"),
       x = "Proportion of Agreements",
       y = "Density") +
  theme_minimal() +
   theme(legend.position = "none",
        plot.title = element_text(hjust = 0.5, face = "bold")) +
  xlim(0, 1)

Now, what I don't understand is, why the line wants to go beyond x=1 at all. Isn't it supposed to stay within the range of x? Secondly, how do I fix this in ggplot or another alternative?

Attempted: Adjusting alpha, trim, filtering out the edges of data or leaving them in, switching to stat_density


Solution

  • There’s no rule that a density function has to intercept the x axis, and nothing necessarily wrong with your plot. In fact, if the line did intercept the x axis at x = 1, this would be an inaccurate representation of your data. This would be communicating that the probability density is 0 when x = 1 — essentially, that there are no cases where the proportion of agreement is 1. But in fact there are some cases where the proportion of agreement is 1; hence the probability density at x = 1 is > 0, so the line is necessarily above the x axis at that point.

    So I would leave your plot as is. That being said, you can add an outline around the density function using outline.type = "full".

    set.seed(13)
    library(ggplot2)
    
    # example data
    dat <- data.frame(
      x = c(rbeta(250, 3, 7), rbeta(250, 10, 1)),
      grp = rep(c("a", "b"), each = 250)
    )
    
    ggplot(dat) +
      geom_density(aes(x, fill = grp), alpha = 0.4, outline.type = "full") +
      scale_fill_grey(start = 0.1, end = 0.8) +
      theme_minimal() +
      xlim(0, 1)
    

    To my eye, this is misleading — it looks like group b has a lot of cases with x very close to 1, but none where x equals 1, which isn’t accurate.

    A compromise might be to add a border to the plot rather than to the density shapes:

    ggplot(dat) +
      geom_density(aes(x, fill = grp), alpha = 0.4) +
      scale_x_continuous(limits = c(0, 1), expand = c(0, 0)) +
      scale_y_continuous(expand = expansion(mult = c(0, 0.05))) +
      scale_fill_grey(start = 0.1, end = 0.8) +
      theme_minimal() +
      theme(panel.border = element_rect(linewidth = 1, fill = NA))
    

    To me at least, this makes the end of the density function look a bit less abrupt, without implying there are 0 cases where x = 1.