I have a table looking like this:
df <- read.table(text =
" Day location gender hashtags
'Feb 19 2016' 'UK' 'M' '#a'
'Feb 19 2016' 'UK' 'M' '#b'
'Feb 19 2016' 'SP' 'F' '#a'
'Feb 19 2016' 'SP' 'F' '#b'
'Feb 19 2016' 'SP' 'M' '#a'
'Feb 19 2016' 'SP' 'M' '#b'
'Feb 20 2016' 'UK' 'F' '#a'",
header = TRUE, stringsAsFactors = FALSE)
And I want to calculate frequencies by day/hashtag/location and gender, having a result table looking like this:
Day hashtags Daily_Freq men women Freq_UK Freq_SP
Feb 19 2016 #a 3 2 1 1 2
Feb 19 2016 #b 3 2 1 1 1
Feb 20 2016 #a 1 0 1 1 0
where Daily_freq=men+women=Freq_UK+Freq_SP How can I do this?
Using dplyr
:
library(dplyr)
df %>%
group_by(Day, hashtags) %>%
summarise(Daily_Freq = n(),
men = sum(gender == 'M'),
women = sum(gender == 'F'),
Freq_UK = sum(location == 'UK'),
Freq_SP = sum(location == 'SP'))
gives:
# A tibble: 3 x 7 # Groups: Day [?] Day hashtags Daily_Freq men women Freq_UK Freq_SP <chr> <chr> <int> <int> <int> <int> <int> 1 Feb 19 2016 #a 3 2 1 1 2 2 Feb 19 2016 #b 3 2 1 1 2 3 Feb 20 2016 #a 1 0 1 1 0
The same logic implemented in data.table
:
library(data.table)
setDT(df)[, .(Daily_Freq = .N,
men = sum(gender == 'M'),
women = sum(gender == 'F'),
Freq_UK = sum(location == 'UK'),
Freq_SP = sum(location == 'SP'))
, by = .(Day, hashtags)]