Search code examples
raggregatevariable-length

Summing up values in one column based on unique values in another column


I am trying to add the values in column C based on unique values in column B.For instance,for B = 1,I would like to add all rows in column C i.e. 5+4+3=12.

A B C
1 1 5
2 1 4 
3 1 3
4 2 1
5 2 3

for(i in unique(df$B)){
  df$D = sum(df$C)  
}

Also, I would like to add the number of times each data in column B occurs.

Solution :

A B C D  E 
1 1 5 12 3
2 1 4 12 3
3 1 3 12 3
4 2 1 4  2
5 2 3 4  2

example from my task :

  docIdx newsgroup_ID  freq  
       1            1   768 
       2            1   125  
       3            1    29 
       4            1    51  
       5            1   198 
       6            1    34 
       7            1    64 
       8            2    35
       9            2    70
       10           2    45

Solution

  • In base R you could use ave

    df[, c("D", "E")] <- with(df, sapply(c(sum, length), function(x) ave(C, B, FUN = x)))
    df
    #  A B C  D E
    #1 1 1 5 12 3
    #2 2 1 4 12 3
    #3 3 1 3 12 3
    #4 4 2 1  4 2
    #5 5 2 3  4 2
    

    Or using dplyr

    library(dplyr)
    df <- df %>%
        group_by(B) %>%
        mutate(D = sum(C), E = length(C))
    df
    ## A tibble: 5 x 5
    ## Groups:   B [2]
    #      A     B     C     D     E
    #  <int> <int> <int> <int> <int>
    #1     1     1     5    12     3
    #2     2     1     4    12     3
    #3     3     1     3    12     3
    #4     4     2     1     4     2
    #5     5     2     3     4     2
    

    Sample data

    df <- read.table(text =
        "A B C
    1 1 5
    2 1 4
    3 1 3
    4 2 1
    5 2 3", header = T)
    

    It works just fine with your revised data

    df <- read.table(text =
        "docIdx newsgroup_ID  freq
           1            1   768
           2            1   125
           3            1    29
           4            1    51
           5            1   198
           6            1    34
           7            1    64
           8            2    35
           9            2    70
           10           2    45", header = T)
    
    
    df[, c("sum.freq", "length.freq")] <- with(df, sapply(c(sum, length), function(x) 
        ave(freq, newsgroup_ID, FUN = x)))
    #   docIdx newsgroup_ID freq sum.freq length.freq
    #1       1            1  768     1269           7
    #2       2            1  125     1269           7
    #3       3            1   29     1269           7
    #4       4            1   51     1269           7
    #5       5            1  198     1269           7
    #6       6            1   34     1269           7
    #7       7            1   64     1269           7
    #8       8            2   35      150           3
    #9       9            2   70      150           3
    #10     10            2   45      150           3
    

    Here ave(freq, newsgroup_ID, FUN = x) applies function x to freq by newsgroup_ID.