Search code examples
rdataframecutfrequency-analysis

Nested cut function for create a frequency table


I am carrying out a frequency table, just for example from the airquality dataset. Below the code:

attach(airquality)
airquality <- airquality
breaks = seq(1.7, 20.7, by=3.8)
airquality.split = cut(airquality$Wind, breaks, right=FALSE)
airquality.freq = table(airquality.split)
airquality.dist = cbind(airquality.freq,100*airquality.freq/sum(airquality.freq),
       cumsum(airquality.freq), 100*cumsum(airquality.freq)/sum(airquality.freq))
colnames(airquality.dist) = c('Frequency','Percentage', 'Cum.Frequency','Cum.Percentage')

I would like to make the same operation but considering the factor Month. I mean to obtain a whole dataframe with the frequency of the Wind variable nested in every month, so as to create an Histogram.

Month                           Frequency Percentage Cum.Frequency Cum.Percentage
Month 1          [1.7,5.5)          [...]  [...]           [...]       [...]
Month 1          [5.5,9.3)          [...]  [...]           [...]       [...]
Month 1          [9.3,13.1)         [...]  [...]           [...]       [...]
Month 1          [13.1,16.9)        [...]  [...]           [...]       [...]
Month 1          [16.9,20.7)        [...]  [...]           [...]       [...]
Month 2          [1.7,5.5)          [...]  [...]           [...]       [...]
Month 2          [5.5,9.3)          [...]  [...]           [...]       [...]
Month 2          [9.3,13.1)         [...]  [...]           [...]       [...]
Month 2          [13.1,16.9)        [...]  [...]           [...]       [...]
Month 2          [16.9,20.7)        [...]  [...]           [...]       [...]

[...]

With these data I would like to make a histogram with different series month having the same color, and within the month the five columns of the percentages (or Frequency). Is it possible to make this directly with cut function?

Thank you in advance.


Solution

  • Using cut you can break Wind into different groups and for each Month calculate ratio using prop.table.

    library(dplyr)
    
    airquality %>%
      count(Month, group = cut(Wind, breaks, right=FALSE), name = 'Frequency') %>%
      group_by(Month) %>%
      mutate(Percentage = prop.table(Frequency) * 100, 
             Cum.Frequency = cumsum(Frequency), 
             Cum.Percentage = Cum.Frequency/max(Cum.Frequency) * 100) %>%
      ungroup