Search code examples
rdistributionfrequency-distribution

Ranked Frequency Distributions from Nominal Variables in R


I have searched through the website but have been unable to find a solution to my problem. I have a sample dataset as follows:

id,l1
1,3
2,5
3,6
1,5
2,4
3,6

id is a nominal variable and represents a unique user and the other is a count variable.

What I want is to find out the distribution of l1 by user. So, looking at my given dataset, id=1 has total l1 = 8; id = 2 has total l1 = 9 and id=3 has total l1 = 12.

I am trying to find out the distribution of l1 according to id but I am stuck. I cannot figure out how to group the relevant columns together and then find the distribution or at least construct a histogram. I can construct a histogram with one variable but I cannot construct a ranked frequency distribution by a nominal variable.


Solution

  • The base R approach would be to use tapply

    If your data.frame was called aa

    sumById <- with(aa, tapply(l1,id, sum)))
    
    barplot(sumById)
    

    enter image description here

    If you wanted to plot your results without explicitly presumarizing, then you could use ggplot2 and stat_summary

    library(ggplot2)
    ggplot(aa, aes(x = id, y = l1)) + stat_summary(fun.y = 'sum', geom = 'bar')
    

    enter image description here