Search code examples
rggplot2legendr-factorgeom-bar

how to add legend in R to a ggplot


I have a problem about R, something should be wrong here, I want to add a legend for the two different variables, one is "Insgesamt_ALL" with red colour,and the other one is "weiblich_ALL" with black colour.

> data<-read.csv("DatensatzUE4_01.csv",sep=";",header=T)

> nDATA<-subset(data,data$Fach=='Wirtschaftsingenieurw.m.wirtschaftswiss.Schwerpkt.')
> 
> library(ggplot2)
> 
> p <- ggplot(data=nDATA,aes(x=Semester,fill=y))
+     ggtitle("GGplot")
+     xlab("Year and Semester")
+     ylab("Total of the student")
+     geom_bar(size=3,aes(y=Insgesamt_ALL,fill=Semester),stat="identity",fill="red")
+     geom_bar(size=3,aes(y=weiblich_ALL,fill=Semester),stat="identity",fill="black")
> 
> p2<-p+ 
    theme(panel.background=element_rect(fill="white"),
          panel.grid.major=element_line(colour="grey",size=0.1),
          plot.title=element_text(face="bold"),
          axis.text=element_text(colour="black"),
          axis.title.x=element_text(face="bold"),
           axis.text.x=element_text(angle=90) )
> 
> plot(p2)

Result:

enter image description here


Solution

  • The basic issue with your missing legend is that you should take advantage of ggplot fill aesthetic by mapping it to a variable. After that you can modify the fill colors as you like. You have 2 variables: Insgesamt_ALL and weiblich_ALL.

    First of all let's build some fake data (see @jlhoward comment) that mimics your actual dataset:

    (tmp_data <- data.frame(Semester = seq(1:12), Insgesamt_ALL = sample(0:3000, 12),     weiblich_ALL = sample(2000:5000, 12)))
    
       Semester Insgesamt_ALL weiblich_ALL
    1         1          2264         2643
    2         2           244         3742
    3         3          1681         2897
    4         4          1037         4342
    5         5          1225         4384
    6         6           478         2195
    7         7            97         2948
    8         8          2537         3509
    9         9          1210         3892
    10       10          2016         2507
    11       11          2524         2415
    12       12           427         4167
    

    First key point is that you should feed ggplot a set of key/value observations, so let's reshape the dataset:

    library(tidyr)
    nDATA    <- gather(tmp_data, variable, count_of_student, Insgesamt_ALL, weiblich_ALL)
    

    Here I used tidyr::gather but any other tools would be ok as well. Let's plot it straight away:

    library(ggplot2)
    p <- ggplot(nDATA) + geom_bar(aes(x = Semester, y = count_of_student, fill = variable), stat = "identity")
    plot(p)
    

    enter image description here

    What you are after is basically changing the fill scale to custom colors (black and red):

    fill_colors        <- c("#000000", "#FF0000")
    names(fill_colors) <- levels(nDATA$variable)
    fill_scale <- scale_fill_manual(name = "Variable", values = fill_colors)
    

    p + fill_scale

    enter image description here

    Finally, let's switch black and red fill colors by reordering the levels of the variable factor:

    nDATA$variable         <- relevel(nDATA$variable, ref = "weiblich_ALL")
    new_fill_colors        <- fill_colors
    names(new_fill_colors) <- levels(nDATA$variable)
    new_fill_scale <- scale_fill_manual(name = "Variable", values = new_fill_colors)
    p + new_fill_scale
    

    enter image description here

    You should be on the right track, now.