Search code examples
rggplot2zoomingdensity-plotecdf

Plot ecdf and density in the same plot and zoom in to specific part


I want to plot the density and ecdf in a same plot using ggplot2. I wrote a code here

library(ggplot2)
library(reshape)


set.seed(101)
var1 = rnorm(1000, 0.5)
var2 = rnorm(100000,0.5)
combine = melt(data.frame("var1" = var1,"var2"= var2))
ggplot(data = combine) + 
  geom_density(aes(x = value, color = variable), alpha = 0.2)+
  scale_y_continuous(name = "Density",sec.axis = sec_axis(~.*(1*(max(density(var1)$y,density(var2)$y))), name = "Ecdf")) +
  ggtitle("Density and Ecdf plot ") +
  theme_bw() +
  theme(plot.title = element_text(size = 14, family = "Tahoma", face = "bold"),
        text = element_text(size = 12, family = "Tahoma")) +
  scale_fill_brewer(palette="Accent")+
  stat_ecdf(aes(x = value, color = variable))

This results in (except the black rectangle)

enter image description here

However, the axis are not correct the left yaxis should be the density limit (0,0.4) and right y axis should be the ecdf limit (0,1). I also want both the figures to be scaled such as maximum of density i.e. 0.4 should correspond to maximum of the ecdf 1.

After this I want to zoom in to the figure especially upper right part (black rectangle, the upper 25%) as the whole plot is not needed. I need the two plots one with full extent and the other one zoomed.

Let me know how its done using ggplot2.


Solution

  • You can try to calculate the density and empirical cumulative distribution before plotting. Here I'm using the tidyverse. Especially purrr::map functions are helpful here.

    library(tidyverse)
    # density
    dens <- combine %>% 
      as.tibble() %>% 
      split(.$variable) %>% 
      map(~density(.x$value) %>% 
            with(.,tibble(x=x, y=y))) %>% 
      bind_rows(.id = "variable") 
    # ecdf
    df <- combine %>% 
      as.tibble() %>% 
      split(.$variable) %>% 
      map2(.,split(dens, dens$variable), ~ecdf(.x$value)(.y$x) %>% 
            tibble(x=.y$x, Ecdf=.)) %>% 
      bind_rows(.id = "variable") %>% 
      bind_cols(dens,.)
    # scaling factor
    SCALE <- max(df$y)
    # the plot
    ggplot(df,aes(x,color=variable)) + 
         geom_line(aes(y=y)) + 
         geom_line(aes(y=Ecdf*SCALE)) +
         scale_y_continuous(name = "Density",sec.axis = sec_axis(trans = ~./SCALE, name = "Ecdf"))
    

    enter image description here

    # zooming 
    p + coord_cartesian(xlim = c(1.5, 5))