Search code examples
rggplot2continuous

R continuous vs categorical percentage share with geom_line


I'd like to create a ggplot geom_line graph with continuous data on the x-axis and the percentage share of a categorical variable. E.g. for mtcars I would like to have hp on the x-axis and the percentage of the cars that have 6 cylinders on the y-axis.

ggplot2(aes(x=hp,y=cyl), data=mtcars) +
geom_line()

I think it needs to be defined in geom_line by fun.y or something similar.


Solution

  • Compute the frequencies beforehand, using reshape for instance :

    library(reshape)
    
    M <- melt(mtcars,id.vars="hp",measure.vars="cyl")
    C <- cast(M,hp~ variable)
    C$f <- C$cyl/sum(C$cyl)
    
    ggplot(C,aes(x=hp,y=f)) +
      geom_line()
    

    Note that in that case, a line plot doesn't seem to make much sense, data points are too far appart. You could use a bar plot instead :

    ggplot(C,aes(x=hp,y=f)) +
      geom_bar(stat="identity")