I have a data like below. If I want to find frequency distribution then I can use hist command as below and using histz$breaks
and histz$counts
find number of observations that fall within each range.
I would like to get distribution of column b by value in column a. My column a is going to have 6 distinct values.
My expected output is a data frame which would have
My data
a=c("a","a","b","a","b","b","c","a")
b=c(1,3,4,3,5,7,8,9)
trial=data.frame(a,b)
histz=hist(trial$b, breaks=c(0,4,6,100),plot=FALSE)
histz
You can use cut()
to categorize b
, then table()
to obtain the distribution in each range. In your example
tab = table(cut(trial$b,breaks=c(0,4,6,100)),trial$a)
Produces
a b c
(0,4] 3 1 0
(4,6] 0 1 0
(6,100] 1 1 1
If you want proportions you can use
ptab = prop.table(tab,margin=2)
and for formatting 2 digits
rtab = round(ptab,2)
resulting in
a b c
(0,4] 0.75 0.33 0.00
(4,6] 0.00 0.33 0.00
(6,100] 0.25 0.33 1.00
Finally, if you want to convert do percent, use the formattable
library
library(formattable)
prtab = apply(rtab,1:2,percent,digits=0)
a b c
(0,4] "75%" "33%" "0%"
(4,6] "0%" "33%" "0%"
(6,100] "25%" "33%" "100%"
You can control the precision with the digits
argument.