Search code examples
rggplot2frequencycategorical-data

R: relative frequency categorical data in ggplot2


I'm working in Rstudio.

With ggplot2, I'm trying to form a plot where I have frequencies of a categorical variable (number of shares purchased), per category (there are 5 categories). For example, members of category A might buy 1 share more frequently than members of category D.

I now have a count plot. However, because one category is much bigger than the others, you don't get a good idea about the n shares in the other categories.

The code of the count plot is as follows:

#ABS. DISTRIBUTION SHARES/CATEGORY
ggplot(dat, aes(x=Number_share, fill=category)) +
  geom_histogram(binwidth=.5, alpha=.5, position="dodge")

This results in this graph: https://i.sstatic.net/QRyx6.jpg

Therefore, I am planning to make a plot where, instead of an absolute count, you have a distribution relative to their category.

I calculated the relative frequencies of each category:

library(MASS)
categories = dat$category
categories.freq = table(categories)
categories.relfreq = categories.freq / nrow(dat)
cbind(categories.relfreq)

categories.relfreq

Beauvent 1 0.002708692

Beauvent 2 0.015020931

E&B 0.037182960

Ecopower 1 0.042107855

Ecopower 2 0.029549372

Ecopower 3 0.873183945

I don't know how to make a plot where the frequency of a share number acquisition is relative to the category, instead of absolute. Can anybody help me with this?


Solution

  • I think what you are looking for is this

    ggplot(dat, aes(x=Number_share, fill=category)) +
      geom_bar(position="fill")
    

    This will stack the categories on top of each other and the position="fill" argument will give the relative counts