Search code examples
rfrequency

Sum up cells by some columns


I have a table looking like this:

 df <- read.table(text = 
      "  Day            location     gender    hashtags
       'Feb 19 2016'       'UK'      'M'       '#a'
       'Feb 19 2016'       'UK'      'M'       '#b'
       'Feb 19 2016'       'SP'      'F'       '#a'
       'Feb 19 2016'       'SP'      'F'       '#b'
       'Feb 19 2016'       'SP'      'M'       '#a'
       'Feb 19 2016'       'SP'      'M'       '#b'
       'Feb 20 2016'       'UK'      'F'       '#a'", 
                 header = TRUE, stringsAsFactors = FALSE) 

And I want to calculate frequencies by day/hashtag/location and gender, having a result table looking like this:

           Day hashtags Daily_Freq men women Freq_UK Freq_SP
   Feb 19 2016       #a          3   2     1       1       2
   Feb 19 2016       #b          3   2     1       1       1
   Feb 20 2016       #a          1   0     1       1       0

where Daily_freq=men+women=Freq_UK+Freq_SP How can I do this?


Solution

  • Using dplyr:

    library(dplyr)
    df %>% 
      group_by(Day, hashtags) %>% 
      summarise(Daily_Freq = n(),
                men = sum(gender == 'M'),
                women = sum(gender == 'F'),
                Freq_UK = sum(location == 'UK'),
                Freq_SP = sum(location == 'SP'))
    

    gives:

    # A tibble: 3 x 7
    # Groups:   Day [?]
      Day         hashtags Daily_Freq   men women Freq_UK Freq_SP
      <chr>       <chr>         <int> <int> <int>   <int>   <int>
    1 Feb 19 2016 #a                3     2     1       1       2
    2 Feb 19 2016 #b                3     2     1       1       2
    3 Feb 20 2016 #a                1     0     1       1       0
    

    The same logic implemented in data.table:

    library(data.table)
    setDT(df)[, .(Daily_Freq = .N,
                  men = sum(gender == 'M'),
                  women = sum(gender == 'F'),
                  Freq_UK = sum(location == 'UK'),
                  Freq_SP = sum(location == 'SP'))
              , by = .(Day, hashtags)]