Search code examples
rplotggplot2stacked-chartstacked-area-chart

R: Stacked area chart does not stack


I have data which I want to plot as a stacked area plot. On the x-axis I have data which is continuous and on the y axis I have continuous data which I prepare to be cumulative. This is the code I am using with some dummy data:

library(data.table)
library(ggplot2)

set.seed(1)
dt <- data.table(var=sample(1:6,1000,replace=TRUE),xdata=runif(1000),ydata=runif(1000))
setorder(dt, var, xdata)

dt$cumydata <- dt[,
                  cumsum(ydata),
                  by = .(var)]$V1/sum(dt$ydata)

ggplot(dt, aes(x = xdata, y = cumydata, fill = as.factor(var))) +
  geom_area(position = "stack")

Here is the output plot: enter image description here

My issue is, that the data does not stack correctly. I guess this could be because of the continuity of the data?


Solution

  • So this is finally how I solved it, based on Jimbou's information. It is just a bit of preprocessing. I also made the whole thing logarithmic.

    enter image description here

    library(data.table)
    library(ggplot2)
    
    set.seed(1)
    dtt <- data.table(var=sample(1:6,1000,replace=TRUE),xdata=runif(1000),ydata=runif(1000))
    
    setorder(dtt, var, xdata)
    
    log.min.xdata <- log(min(dtt$xdata))
    log.max.xdata <- log(max(dtt$xdata))
    
    nbreaks <- 101
    
    temp <- hist(log(dtt$xdata[dtt$var==1]),
                 breaks = seq(log.min.xdata, log.max.xdata, length=nbreaks),
                 plot = FALSE)
    
    
    dt <- data.table(var = unlist(lapply(sort(unique(dtt$var)),
                                         function(x){rep(x,nbreaks-1)})),
                     bin = rep(1:(nbreaks-1),length(unique(dtt$var))),
                     mid = rep(temp$mids))
    
    dt$count <- dt[,
                   hist(log(dtt$xdata[dtt$var==var]), 
                        breaks = seq(log.min.xdata, log.max.xdata, length=nbreaks),
                        plot = FALSE)$counts,
                   by = .(var)]$V1
    
    dt$cumcount <- dt[,
                      cumsum(count),
                      by = .(var)]$V1
    
    
    
    pp <- ggplot(dt, aes(x = exp(mid), y = cumcount, fill = as.factor(var))) +
      geom_area(position = "stack") +
      scale_x_log10() +
      theme_bw() +
      theme(legend.position = c(0.1, 0.70),
            legend.background = element_rect(fill="lightgrey", 
                                             size=0.5, linetype="solid")) +
      labs(title = "y",
           fill = " var",
           x = "xdata",
           y = "cumcount") +
      theme(title = element_text(face = "bold"),
            axis.title = element_text(face = "bold"),
            legend.title = element_text(face = "bold"),
            legend.text = element_text(face = "bold"))