I have searched through the website but have been unable to find a solution to my problem. I have a sample dataset as follows:
id,l1
1,3
2,5
3,6
1,5
2,4
3,6
id is a nominal variable and represents a unique user and the other is a count variable.
What I want is to find out the distribution of l1 by user. So, looking at my given dataset, id=1 has total l1 = 8; id = 2 has total l1 = 9 and id=3 has total l1 = 12.
I am trying to find out the distribution of l1 according to id but I am stuck. I cannot figure out how to group the relevant columns together and then find the distribution or at least construct a histogram. I can construct a histogram with one variable but I cannot construct a ranked frequency distribution by a nominal variable.
The base
R
approach would be to use tapply
If your data.frame was called aa
sumById <- with(aa, tapply(l1,id, sum)))
barplot(sumById)
If you wanted to plot your results without explicitly presumarizing, then you could use ggplot2
and stat_summary
library(ggplot2)
ggplot(aa, aes(x = id, y = l1)) + stat_summary(fun.y = 'sum', geom = 'bar')