Search code examples
rplotsubsetfrequencydensity-plot

Superimposing two or more subsets in same plot


I'm trying to visualize a 3 level subset of my data in one figure for two different treatments.

I want to visualize the distribution of age for only 1 year (2007), for only one item (tattoo), and for females and males separately.

I am able to reduce my dataset to only females, only in 2007, and only for tattoos using:

with(data[(data$sex=="F") & (data$yy=="2007") & (data$item=="tattoo"),], plot(age, xlab="Age of Females", ylab="Frequency")) 

With this code, I am able to see a frequency distribution of my data. 3 tier subset

But, I am unable, using that code, to do two things:

  1. visualize the data as a density plot

  2. superimpose the multiple tier subset for males

The closest I've been able to come is using this code:

library(sm)
sm.density.compare(age, sex, xlab="Age (years)")
legend(50,0.12, c("Female","Male"), col=c("red", "green"), pch=c(16,16), title="Sex", box.lty=0)

It gives this figure: Density plot

But, with this code, I am unable to get the data to be restricted to the year 2007 and only tattoos.

My question is two fold:

  1. Is it possible to superimpose the male data (for 2007 and tattoos) on the female frequency data?

  2. How can I restrict the density data to only 2007 and tattoos?

I have made a subset of my data available here.

UPDATE: For the frequency histogram, I am trying to visualize the data with the bars for female and male adjacent to each other for each bin.


Solution

  • With standard R plotting you can do as follows

    with(data[(data$sex=="F") & (data$yy=="2007") & (data$item=="tattoo"),], plot(density(age)))
    with(data[(data$sex=="M") & (data$yy=="2007") & (data$item=="tattoo"),], lines(density(age), col = "red"))
    segments(50,0.1,52,0.1, col = "black")
    text(52,0.1, pos = 4, labels = "Female")
    segments(50,0.09,52,0.09, col = "red")
    text(52,0.09, pos = 4, labels = "Male")
    

    enter image description here

    A smooth alternative is to use ggplot2 and the easyGgplot2 package by kassambara

    library(devtools)
    install_github("kassambara/easyGgplot2")
    library(easyGgplot2)
    library(ggplot2)
    my.subset <- data[(data$yy=="2007") & (data$item=="tattoo"),]
    ggplot2.histogram(data=my.subset, xName='age',binwidth = 2,
                      groupName='sex', legendPosition="top",
                      alpha=0.5, position="identity")
    

    enter image description here