Hi I am having some trouble figuring out how to properly format the stacked bar plot I am trying to produce in ggplot2. I have tried searching through previous questions but none of them seem to answer the problems I am running into. Geom_Bar using 1 + 2 as dummy variables In the first attached chart I am close to what I want in the chart but the scale on the side shows 5 values vs the ('1' and '2') which are the only two variables in the frame. Essentially I am trying to fix the fill scale as only having the '1' and '2' values and if it would be possible to edit those to say 'Yes' and 'No' Below I have attached the code:
ggplot(AggSignedDummyVar, aes(fill=AggSignedDummyVar$`Signed by Drafting Club`, x = AggSignedDummyVar$`College Conference`, y = MLS_Draft_File$`Signed by Drafting Club`)) +
xlim('American Athletic Conference', 'Atlantic-10 Conference', 'Atlantic Coast Conference', 'Big East Conference', 'Big West Conference', 'Ivy League', 'Mid-American Conference', 'Pac-12 Conference', 'West Coast Conference')
I also tried rewriting the code from above using ('Yes' and 'No') as opposed to the dummy variables from the above code. This section seems to hold count of the occurrences but doesn't display them and attaches 'Yes' and 'No' to the lower portion of the Y-Axis(shouldn't be there). Geom_bar but without Dummy Variable . I have attached the code to this below:
ggplot(MLS_Draft_File_Aggregated_Non_Numeric_, aes(fill=MLS_Draft_File_Aggregated_Non_Numeric_$`Signed by Drafting Club`, x = MLS_Draft_File_Aggregated_Non_Numeric_$`College Conference`, y = MLS_Draft_File_Aggregated_Non_Numeric_$`Signed by Drafting Club`)) +
xlim('American Athletic Conference', 'Atlantic-10 Conference', 'Atlantic Coast Conference', 'Big East Conference', 'Big West Conference', 'Ivy League', 'Mid-American Conference', 'Pac-12 Conference', 'West Coast Conference')
Hopefully I explained this properly and thank you in advance for any help you can provide.
If you provide a ggplot::aes
function a variable, ggplot
tries to guess how you want to use this data. If the data is numeric, it uses it as a continuous variable, even if there are only 2 different values. If you provide it a discrete variable ggplot
uses it accordingly.
Consider the following two plots:
library(ggplot2)
ggplot(mtcars, aes(x=mpg, y=hp, fill=cyl)) + geom_bar(stat="identity")
The variable for fill
is numeric -> ggplot
treats it as continuous
But here:
ggplot(mtcars, aes(x=mpg, y=hp, fill=factor(cyl))) + geom_bar(stat="identity")
We re-cast cyl
as a factor before passing it to aes
(we could also use character, but factor has the advantage that we could specify the order of the levels. This ordering will be used by ggplot
)