Search code examples
rplotggplot2frequency-distribution

Visualizing relative frequency in R / ggplot2


I tried to wrap my head around the problem of how to visualize a bunch of relative frequencies in a way that makes it easy to see how they fare compared to each other. The differences aren't gigantic in terms of distribution, which, of course, I also consider something worthy to be shown. I've managed to create a relatively simple point plot, however, I don't think it really looks good enough.

The code is straightforward (albeit unfinished as far as visual tweaks are concerned), I guess:

library(ggplot2)
copuladeletion <- read.table(text = "Type    Distribution    Family
                             NP  0.39344 Austronesian    
                             NP  0.30232 Mon-Khmer
                             NP  0.3125  Tai-Kadai
                             NP  0.29230 Sinitic
                             NP  0.26785 Other
                             AdjP    0.44262 Austronesian
                             AdjP    0.53488 Mon-Khmer
                             AdjP    0.625   Tai-Kadai
                             AdjP    0.55384 Sinitic
                             AdjP    0.58928 Other
                             AdvP    0.03278 Austronesian
                             AdvP    0.00000 Mon-Khmer
                             AdvP    0.00000 Tai-Kadai
                             AdvP    0.04615 Sinitic
                             AdvP    0.07142 Other
                             EX  0.01639 Austronesian
                             EX  0.02325 Mon-Khmer
                             EX  0.00000 Tai-Kadai
                             EX  0.03076 Sinitic
                             EX  0.01785 Other
                             Clause  0.08196 Austronesian
                             Clause  0.02325 Mon-Khmer
                             Clause  0.0625  Tai-Kadai
                             Clause  0.03076 Sinitic
                             Clause  0.05357 Other
                             Other   0.01639 Austronesian
                             Other   0.11627 Mon-Khmer
                             Other   0.00000 Tai-Kadai
                             Other   0.04615 Sinitic
                             Other   0.00000 Other", header = TRUE)
ggplot(copuladeletion) + geom_point(aes(Distribution, Type, colour=Family,size=1))

Which yields the following image:

enter image description here

So, my questions are:

Do you think this visualization works well enough? Are there any preferable options over a simple point plot for these data?

Thank you very much in advance!


Solution

  • Perhaps just another take on your strip charts:

    library(ggplot2)
    
    copuladeletion <- read.table(text=txt, header=TRUE)
    
    gg <- ggplot(copuladeletion) 
    gg <- gg + geom_point(aes(Distribution, Type, colour=Family),
                          shape="|", size=10)
    gg <- gg + scale_x_continuous(breaks=seq(0, 0.7, 0.1))
    gg <- gg + scale_y_discrete(expand=c(0,0))
    gg <- gg + scale_colour_brewer(name="", palette="Set1")
    gg <- gg + facet_wrap(~Type, ncol=1, scales="free_y")
    gg <- gg + guides(colour=guide_legend(override.aes=list(shape=15, size=3)))
    gg <- gg + labs(x=NULL, y=NULL, title="Family Distribution by Type")
    gg <- gg + theme_bw()
    gg <- gg + theme(panel.grid.major=element_blank())
    gg <- gg + theme(panel.grid.minor=element_blank())
    gg <- gg + theme(strip.background=element_blank())
    gg <- gg + theme(strip.text=element_blank())
    gg <- gg + theme(axis.ticks=element_blank())
    gg <- gg + theme(legend.key=element_blank())
    gg <- gg + theme(legend.position="bottom")
    gg
    

    enter image description here

    To slightly compensate for the overlaps (as Roman has pointed out a cpl times) you can use a proper line vs a hack-y point:

    gg <- ggplot(copuladeletion) 
    gg <- gg + geom_segment(aes(x=Distribution, xend=Distribution,
                                y=0, yend=1, colour=Family), size=0.25)
    gg <- gg + scale_x_continuous(breaks=seq(0, 0.7, 0.1))
    gg <- gg + scale_y_discrete(expand=c(0,0))
    gg <- gg + scale_colour_brewer(name="", palette="Set1")
    gg <- gg + facet_wrap(~Type, ncol=1, scales="free_y", switch="y")
    gg <- gg + labs(x=NULL, y=NULL, title="Family Distribution by Type")
    gg <- gg + guides(colour=guide_legend(override.aes=list(shape=15, size=3)))
    gg <- gg + theme_bw()
    gg <- gg + theme(panel.border=element_rect(color="#2b2b2b", size=0.15))
    gg <- gg + theme(panel.grid.major=element_blank())
    gg <- gg + theme(panel.grid.minor=element_blank())
    gg <- gg + theme(strip.background=element_blank())
    gg <- gg + theme(strip.text.y=element_text(angle=180))
    gg <- gg + theme(axis.ticks=element_blank())
    gg <- gg + theme(legend.key=element_blank())
    gg <- gg + theme(legend.position="bottom")
    gg
    

    enter image description here

    You can add an aesthetic to map linetype as well (and hjust the y labels as you like). These thin lines are kinda hard to read (so tweak size at-will as well), but I do think a strip chart works pretty well for this data. You may want to "zoom out" the EX strip in a separate plot if you need to (I have no idea what this data really is trying to say :-)