Search code examples
rdata-visualizationbinning

Dotplot 'binning/grouping' in R


I'm trying to create a dotplot in R, similar to the following plot, where each group is distinctly separated from the rest: http://www.sthda.com/english/wiki/ggplot2-dot-plot-quick-start-guide-r-software-and-data-visualization
ideal plot

The data I have looks as follows, where I have a value to plot, and a group column that should bin the data into distinct groups (1-5) (similar to the 'dose' column in the Toothache dataset in the previous link):
my data

This is the plotting code I'm currently using:

p<-ggplot(new_df, aes(x=group, y=ploidy)) + 
  geom_dotplot(binaxis='y', stackdir='centerwhole', binpositions = 'bygroup', binwidth = 0.5, position = "dodge", dotsize = 0.2)

ggplot(new_df, aes(x=group, y=ploidy)) + 
  geom_dotplot(binaxis='y', stackdir='centerwhole',
               stackratio=0, dotsize=0.2, stackgroups = TRUE)
p + stat_summary(fun=median, geom="point", shape=18,
                 size=3, color="red")

and it returns the following plot: current plot I suspect the issue here is that the majority of the values sit at the 2-3 range, and thus they're overflowing to the other bins/groups.

I tried re-creating the problem with simple datasets like the Toothache dataset, but the issue doesn't reappear in those smaller datasets. Here is a link to the dataset, since recreating the problem with small sample datasets doesn't work: http://sendanywhe.re/Y5O133EM

Any help would be appreciated


Solution

  • I think you are overflowing the allocated space in the chart by using specified locations for each individual observation (sometimes called 'stacking'). Instead you should 'jitter' the positions of the individual observations inside a specific allocated region. Jittering, means to introduce a small amount of randomness to the position of a point to avoid (mostly anyhow) overplotting.

    I will illustrate this using graphics from the core of R for the following fictitious data. This focuses attention on what is wrong, more than on the specific programming solution in ggplot, which I will let you work out.

    set.seed(2022)
    a = round(rnorm(30, 50, 5))
    b = round(rnorm(70, 55, 4))
    c = round(rnorm(55, 40, 6))
    d = round(rnorm(80, 45, 5))
    x = c(a,b,c,d)
    g = rep(1:4, c(30,70,55,80))
    
    
    stripchart(x ~ g, meth="jitter", vertical=T, pch=20)
    

    Sorry, not allowed to post images on this site. Hope you you get the idea.