Search code examples
rgroup-bydplyrfrequencysummarize

R relative frequencies organized by Group


I have the following data frame:

> LikelyRenew_ReasonB %>%
    +   mutate(cum_Sum = ave(freq,Name,FUN = cumsum))
          Name x freq cum_Sum
 1       costC 1   10      10
 2       costC 2   11      21
 3       costC 3   17      38
 4       costC 4  149     187
 5   productsC 1   31      31
 6   productsC 2   40      71
 7   productsC 3   30     101
 8   productsC 4   86     187
 9     communC 1   51      51
 10    communC 2   50     101
 11    communC 3   34     135
 12    communC 4   52     187
 13 reimburseC 1   42      42
 14 reimburseC 2   26      68
 15 reimburseC 3   25      93
 16 reimburseC 4   94     187
 17    policyC 1   31      31
 18    policyC 2   25      56
 19    policyC 3   28      84
 20    policyC 4  103     187
 21  discountC 1    2       2
 22  discountC 2    2       4
 23  discountC 3    3       7
 24  discountC 4  180     187

Here is what the variable looks like:

> dput(head(LikelyRenew_ReasonB))
structure(list(Name = c("costC", "costC", "costC", "costC", "productsC", 
"productsC"), x = c(1, 2, 3, 4, 1, 2), freq = c(10L, 11L, 17L, 
149L, 31L, 40L)), .Names = c("Name", "x", "freq"), row.names = c(NA, 
6L), class = "data.frame")

I am trying to get for each group, the relative frequency for each frequency score and then the sum of the relative frequencies for the group. I put below a sample of what I am looking for - the 1st three rows are their freq/cum_Sum[x==4]. The last row should be a total of those 3 rows.

Is this possible?? I am totally stumped.

          Name x freq cum_Sum    IdealOutput   *how i calculated IdealOutput
 1       costC 1   10      10         5.35       (10/187)
 2       costC 2   11      21         5.88       (11/187)
 3       costC 3   17      38         9.09       (17/187) 
 4       costC 4  149     187         20.32      (sum of above 3 values)

Solution

  • You can try using dplyr::lag on cum_Sum to calculate IdealOutput for last row of a group.

    The last row of a group can be found using condition as row_number() == n()

    library(dplyr)
    
    LikelyRenew_ReasonB %>% group_by(Name) %>%
      arrange(Name, x) %>%
      mutate(cum_Sum = cumsum(freq)) %>%
      mutate(IdealOutput = ifelse(row_number() == n(), 
                           lag(cum_Sum)/sum(freq), freq/sum(freq))) 
    
    #   # A tibble: 6 x 5
    #   # Groups: Name [2]
    #   Name          x  freq cum_Sum IdealOutput
    # <chr>     <dbl> <int>   <int>       <dbl>
    # 1 costC      1.00    10      10      0.0535
    # 2 costC      2.00    11      21      0.0588
    # 3 costC      3.00    17      38      0.0909
    # 4 costC      4.00   149     187      0.203 
    # 5 productsC  1.00    31      31      0.437 
    # 6 productsC  2.00    40      71      0.437 
    

    Data:

    LikelyRenew_ReasonB  <- structure(list(Name = c("costC", "costC", "costC", "costC", "productsC", 
    "productsC"), x = c(1, 2, 3, 4, 1, 2), freq = c(10L, 11L, 17L, 
    149L, 31L, 40L)), .Names = c("Name", "x", "freq"), row.names = c(NA, 
    6L), class = "data.frame")