Search code examples
rplotbar-chartdistributioncumulative-frequency

how to plot cumulative distribution plot in R


I have a data and I want to plot cumulative distribution of this. My data is:

 dput(gene_snp_distance[1:20, c(1:3)])
structure(list(distance = c(1000, 2000, 3000, 4000, 5000, 6000, 
7000, 8000, 9000, 10000, 11000, 12000, 13000, 14000, 15000, 16000, 
17000, 18000, 19000, 20000), all_snps = c(10, 11.8, 13.6, 15.4, 
17.2, 19, 20.8, 22.6, 24.4, 26.2, 28, 29.8, 31.6, 33.4, 35.2, 
37, 38.8, 40.6, 42.4, 44.2), gtex_snps = c(12, 14.3, 16.6, 18.9, 
21.2, 23.5, 25.8, 28.1, 30.4, 32.7, 35, 37.3, 39.6, 41.9, 44.2, 
46.5, 48.8, 51.1, 53.4, 56.2)), row.names = c(NA, 20L), class = "data.frame")

And this is what I tried:

plot(gene_snp_distance$distance, gene_snp_distance$all_snps)

But I want to plot a graph like this: wherein the first blue plot represent all_Snps and red represents gtex_Snps. Does anyone know how to plot this similar graph.

enter image description here My plotted graph look like this: enter image description here But I want it to represent a bar for each distance and do in one graph for both all_snps and gtex_snps like a bar graph on top of each other.

Does anyone know how to plot a similar graph like this one. Thank you.


Solution

  • Not sure if this is what you want. Since all_Snps is not defined in your data, I can only guess that you would like to plot the cumulative sum of gtex_snps next to gtex_snps itself.

    library(tidyverse)
    
    p <- gene_snp_distance |> 
      pivot_longer(cols = matches("snps"),
                   names_to = "type", 
                   values_to = "value") |> 
      ggplot(aes(distance, value, fill = type))
    
    p + geom_col()
    

    p + geom_col(position = "dodge")
    

    Created on 2023-04-02 with reprex v2.0.2

    where

    gene_snp_distance <- structure(list(distance = c(1000, 2000, 3000, 4000, 5000, 6000, 
                                      7000, 8000, 9000, 10000, 11000, 12000, 13000, 14000, 15000, 16000, 
                                      17000, 18000, 19000, 20000), all_snps = c(10, 11.8, 13.6, 15.4, 
                                                                                17.2, 19, 20.8, 22.6, 24.4, 26.2, 28, 29.8, 31.6, 33.4, 35.2, 
                                                                                37, 38.8, 40.6, 42.4, 44.2), gtex_snps = c(12, 14.3, 16.6, 18.9, 
                                                                                                                           21.2, 23.5, 25.8, 28.1, 30.4, 32.7, 35, 37.3, 39.6, 41.9, 44.2, 
                                                                                                                           46.5, 48.8, 51.1, 53.4, 56.2)), row.names = c(NA, 20L), class = "data.frame")