I am carrying out a frequency table, just for example from the airquality dataset. Below the code:
attach(airquality)
airquality <- airquality
breaks = seq(1.7, 20.7, by=3.8)
airquality.split = cut(airquality$Wind, breaks, right=FALSE)
airquality.freq = table(airquality.split)
airquality.dist = cbind(airquality.freq,100*airquality.freq/sum(airquality.freq),
cumsum(airquality.freq), 100*cumsum(airquality.freq)/sum(airquality.freq))
colnames(airquality.dist) = c('Frequency','Percentage', 'Cum.Frequency','Cum.Percentage')
I would like to make the same operation but considering the factor Month
. I mean to obtain a whole dataframe with the frequency of the Wind variable nested in every month, so as to create an Histogram.
Month Frequency Percentage Cum.Frequency Cum.Percentage
Month 1 [1.7,5.5) [...] [...] [...] [...]
Month 1 [5.5,9.3) [...] [...] [...] [...]
Month 1 [9.3,13.1) [...] [...] [...] [...]
Month 1 [13.1,16.9) [...] [...] [...] [...]
Month 1 [16.9,20.7) [...] [...] [...] [...]
Month 2 [1.7,5.5) [...] [...] [...] [...]
Month 2 [5.5,9.3) [...] [...] [...] [...]
Month 2 [9.3,13.1) [...] [...] [...] [...]
Month 2 [13.1,16.9) [...] [...] [...] [...]
Month 2 [16.9,20.7) [...] [...] [...] [...]
[...]
With these data I would like to make a histogram with different series month
having the same color, and within the month the five columns of the percentages (or Frequency). Is it possible to make this directly with cut
function?
Thank you in advance.
Using cut
you can break Wind
into different groups and for each Month
calculate ratio using prop.table
.
library(dplyr)
airquality %>%
count(Month, group = cut(Wind, breaks, right=FALSE), name = 'Frequency') %>%
group_by(Month) %>%
mutate(Percentage = prop.table(Frequency) * 100,
Cum.Frequency = cumsum(Frequency),
Cum.Percentage = Cum.Frequency/max(Cum.Frequency) * 100) %>%
ungroup