I'm working in Rstudio.
With ggplot2, I'm trying to form a plot where I have frequencies of a categorical variable (number of shares purchased), per category (there are 5 categories). For example, members of category A might buy 1 share more frequently than members of category D.
I now have a count plot. However, because one category is much bigger than the others, you don't get a good idea about the n shares in the other categories.
The code of the count plot is as follows:
#ABS. DISTRIBUTION SHARES/CATEGORY
ggplot(dat, aes(x=Number_share, fill=category)) +
geom_histogram(binwidth=.5, alpha=.5, position="dodge")
This results in this graph: https://i.sstatic.net/QRyx6.jpg
Therefore, I am planning to make a plot where, instead of an absolute count, you have a distribution relative to their category.
I calculated the relative frequencies of each category:
library(MASS)
categories = dat$category
categories.freq = table(categories)
categories.relfreq = categories.freq / nrow(dat)
cbind(categories.relfreq)
categories.relfreq
Beauvent 1 0.002708692
Beauvent 2 0.015020931
E&B 0.037182960
Ecopower 1 0.042107855
Ecopower 2 0.029549372
Ecopower 3 0.873183945
I don't know how to make a plot where the frequency of a share number acquisition is relative to the category, instead of absolute. Can anybody help me with this?
I think what you are looking for is this
ggplot(dat, aes(x=Number_share, fill=category)) +
geom_bar(position="fill")
This will stack the categories on top of each other and the position="fill"
argument will give the relative counts