Search code examples
rfrequencyvariancefrequency-distribution

Calculating grouped variance from a frequency table in R


How can I, in R calculate the overall variance and the variance for each group from a dataset that looks like this (for example):

Group Count Value
A      3     5
A      2     8
B      1     11
B      3     15

I know to calculate the variance as a whole, ignoring the groups I would do: var(rep(x$Value, x$Count)), but how do I automatically calculate the variance for each group accounting for the frequency? E.g., the variance for group A, group B, etc.,.. I would like my output to have the following headers:

Group, Total Count, Group Variance 

I have also reviewed this link; R computing mean, median, variance from file with frequency distribution which is different (does not have the group component) so this is not a duplicate.

thank you for all of the help.


Solution

  • One option is using data.table. Convert the data.frame to data.table (setDT) and get the var of "Value" and sum of "Count" by "Group".

    library(data.table)
    setDT(df1)[, list(GroupVariance=var(rep(Value, Count)),
                          TotalCount=sum(Count)) , by = Group]
    #    Group GroupVariance TotalCount
    #1:     A           2.7          5
    #2:     B           4.0          4
    

    a similar way using dplyr is

    library(dplyr)
    group_by(df1, Group) %>% 
          summarise(GroupVariance=var(rep(Value,Count)), TotalCount=sum(Count))
    #     Group GroupVariance TotalCount
    #1     A           2.7          5
    #2     B           4.0          4