Search code examples
rggplot2geom-bar

Formatting Geom_Bar in ggplot2


Hi I am having some trouble figuring out how to properly format the stacked bar plot I am trying to produce in ggplot2. I have tried searching through previous questions but none of them seem to answer the problems I am running into. Geom_Bar using 1 + 2 as dummy variables In the first attached chart I am close to what I want in the chart but the scale on the side shows 5 values vs the ('1' and '2') which are the only two variables in the frame. Essentially I am trying to fix the fill scale as only having the '1' and '2' values and if it would be possible to edit those to say 'Yes' and 'No' Below I have attached the code:

    ggplot(AggSignedDummyVar, aes(fill=AggSignedDummyVar$`Signed by Drafting Club`, x = AggSignedDummyVar$`College Conference`, y = MLS_Draft_File$`Signed by Drafting Club`)) + 
  xlim('American Athletic Conference', 'Atlantic-10 Conference', 'Atlantic Coast Conference', 'Big East Conference', 'Big West Conference', 'Ivy League', 'Mid-American Conference', 'Pac-12 Conference', 'West Coast Conference') 

I also tried rewriting the code from above using ('Yes' and 'No') as opposed to the dummy variables from the above code. This section seems to hold count of the occurrences but doesn't display them and attaches 'Yes' and 'No' to the lower portion of the Y-Axis(shouldn't be there). Geom_bar but without Dummy Variable . I have attached the code to this below:

    ggplot(MLS_Draft_File_Aggregated_Non_Numeric_, aes(fill=MLS_Draft_File_Aggregated_Non_Numeric_$`Signed by Drafting Club`, x = MLS_Draft_File_Aggregated_Non_Numeric_$`College Conference`, y = MLS_Draft_File_Aggregated_Non_Numeric_$`Signed by Drafting Club`)) + 
  xlim('American Athletic Conference', 'Atlantic-10 Conference', 'Atlantic Coast Conference', 'Big East Conference', 'Big West Conference', 'Ivy League', 'Mid-American Conference', 'Pac-12 Conference', 'West Coast Conference') 

Hopefully I explained this properly and thank you in advance for any help you can provide.


Solution

  • If you provide a ggplot::aes function a variable, ggplot tries to guess how you want to use this data. If the data is numeric, it uses it as a continuous variable, even if there are only 2 different values. If you provide it a discrete variable ggplot uses it accordingly.

    Consider the following two plots:

    library(ggplot2)
    ggplot(mtcars, aes(x=mpg, y=hp, fill=cyl)) + geom_bar(stat="identity")
    

    continuous variable for fill

    The variable for fill is numeric -> ggplot treats it as continuous

    But here:

    ggplot(mtcars, aes(x=mpg, y=hp, fill=factor(cyl))) + geom_bar(stat="identity")
    

    factor for fill

    We re-cast cyl as a factor before passing it to aes (we could also use character, but factor has the advantage that we could specify the order of the levels. This ordering will be used by ggplot)