Search code examples
rggplot2density-plot

Density plot exceeds x-axis interval


I am attempting to make some density plots using ggplot2 but the distribution exceeds the bounds of my data. Specifically, I am trying to show the distribution of GPS locations in 2 habitat types over time (hours of the day). As I am only interested in displaying the distribution of locations during daylight (0500 to 2100), I have filtered out hours occurring at night. However when I plot the data, the distribution exceeds both hours 5 and 21 on the x-axis. I have a feeling it has to do with "scale_x_continuous" in ggplot, where I have specified the limits to be (0,24), but that doesn't explain why the distribution exceeds daytime hours when there is not data before or after those hours. FYI, I do want the entire time series to show even though I don't have data for each hour.

But again, I only have data between the hours of 5 and 21. Can someone explain what might be going on here? Hopefully I am making sense. Thanks!

Sample code:

locs.19
locs.19 <- subset(locs, hour >= 5 & hour <=21)

> head(locs.19)
     ID         x        y         datetime hour shelfhab
2019_01 -122.9979 37.68930 2019-06-07 05:04    5    inner
2019_01 -122.9977 37.68833 2019-06-07 05:06    5    inner
2019_01 -122.9975 37.68737 2019-06-07 05:08    5    inner
2019_01 -122.9974 37.68644 2019-06-07 05:10    5    inner
2019_01 -122.9974 37.68550 2019-06-07 05:12    5    inner
2019_01 -122.9974 37.68457 2019-06-07 05:14    5    inner

> str(locs.19)
'data.frame' :  6531 obs. of  6 variables:
 $ ID       : chr  "2019_01" "2019_01" "2019_01" "2019_01" ...
 $ x        : num  -123 -123 -123 -123 -123 ...
 $ y        : num  37.7 37.7 37.7 37.7 37.7 ...
 $ datetime : chr  "2019-06-07 05:04" "2019-06-07 05:06" "2019-06-07 05:08" "2019-06-07 05:10" ...
 $ hour     : int  5 5 5 5 5 5 5 5 5 5 ...
 $ shelfhab : chr  "inner" "inner" "inner" "inner" ...

### Plot ###
p19 <- ggplot(locs.19, aes(x = hour))+ 
  geom_density(aes(fill = shelfhab), alpha = 0.4)+
  xlab("Time of Day (24 h)")+
  theme(legend.position = "right",panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black"),
        text = element_text(size = 14,family = "Calibri"))+
  scale_x_continuous(breaks=seq(0,24,2),limits = c(0, 24), expand = c(0,1))

p19

enter image description here


Solution

  • The issue is that you set the limits in scale_x_continuous. Thereby you set the range over which the denisty is estimated. To achieve your desired result simply set the limits via coord_cartesian. This way the density is only estimated on your data while you still get a scale ranging from 0 to 24 hours.

    Using some random example data:

    set.seed(42)
    
    # Example data
    locs.19 <- data.frame(hour = sample(5:21, 1000, replace = TRUE),
                          shelfhab = sample(c("inner", "outer"), 1000, replace = TRUE))
    
    library(ggplot2)
    
    ggplot(locs.19, aes(x = hour))+ 
      geom_density(aes(fill = shelfhab), alpha = 0.4)+
      xlab("Time of Day (24 h)")+
      theme(legend.position = "right",panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
            axis.line = element_line(colour = "black"),
            text = element_text(size = 14))+
      scale_x_continuous(breaks=seq(0,24,2), expand = c(0,1)) +
      coord_cartesian(xlim = c(0, 24))