Search code examples
rplott-test

plot boxplot using ggplot2 and conduct two sample t-tests


This is my data. You can open this link https://www.dropbox.com/s/3bypmpojkpnomos/trial1.txt?dl=0

i want to plot a boxplot where male and female is in x axis and their frequency in y axis. then from there, i want to conduct two sample t-tests. is there any way to differentiate between disturb and undisturbed habitat too?

this is what i've tried;

# install and load ggplot2
trial1$Sex <- factor(trial1$Sex,labels = c("Female", "Male"))
P1 <- qplot(trial1$Sex, xlab="Host Sex", ylab="Host caught", main="HOSTS CAUGHT VS SEX")
trial1$Habitat <- factor(trial1$Age,labels = c("Disturb", "Undisturb"))
P2 <- qplot(trial1$Habitat, xlab="Habitat", ylab="Host caught", main="HOSTS CAUGHT VS HABITAT")

# calculatefrequency
library(plyr) #can also count using this package
#calculate frequency and make data frame
library(dplyr)#or this package
f1 <- factor(c(Sex))
T1 <- table(f1) #create table of frequency

f2 <- factor(c(Habitat))
T2 <- table(f2)

a1 <- ggplot(data = trial1, aes(x = Sex, y = Freq, colour = Sex)) + 
      geom_boxplot() + xlab("Sex") + ylab("Total ectoparasites") + 
      ggtitle("Sex vs Total ectoparasites")

Solution

  • The first thing you should do with this type of data is to reshape it from wide to long format. This means creating 2 columns, one for P1, P2 etc. and one for the corresponding values.

    library(dplyr)
    library(tidyr)
    library(ggplot2)
    trial1 %>% 
      gather(variable, value, -Habitat, -Sex, -Birds)
    

    I would not recommend a boxplot in this case; given the large number of zero values in the data, it would not be informative. Can I suggest using geom_jitter to plot counts versus Sex, and using facets to subset further by Habitat:

    trial1 %>% 
      gather(variable, value, -Habitat, -Sex, -Birds) %>% 
      ggplot(aes(Sex, value)) + 
      geom_jitter(width = 0.2, alpha = 0.3) + 
      facet_grid(Habitat ~ .) +
      labs(y = "total ectoparasites", title = "Total ectoparasites by Sex and Habitat") +
      theme_light()
    

    enter image description here

    There are many ways you could summarise the data for subsequent statistical tests. For example, to get a 2 x 2 table of counts by Sex and Habitat (actually 2 x 3, since Sex is the first column:

    trial1 %>% 
      gather(variable, value, -Habitat, -Sex, -Birds) %>% 
      group_by(Sex, Habitat) %>% 
      summarise(count = sum(value)) %>% 
      spread(Habitat, count)
    
         Sex Disturb Undisturb
    *  <chr>   <int>     <int>
    1 Female       6        23
    2   Male      69       117