I have a data frame like this:
message.id sender recipient
1 1 A B
2 1 A C
3 2 A B
4 3 B C
5 3 B D
6 3 B Q
I would like to summarize it by the counts of values in the sender and recipient columns to get this:
address messages.sent messages.received
1 A 3 0
2 B 3 2
3 C 0 2
4 D 0 1
5 Q 0 1
I have working code, but it's messy, and I'm hoping there's a way to do this all in one magrittr
chain instead of what I have below:
df <- data.frame(message.id = c(1,1,2,3,3,3),
sender = c("A","A","A","B","B","B"),
recipient = c("B","C","B","C","D","Q"))
sent <- df %>%
group_by(sender) %>%
summarise(messages.sent = n()) %>%
mutate(address = sender) %>%
select(address, messages.sent)
received <- df %>%
group_by(recipient) %>%
summarise(messages.received = n()) %>%
mutate(address = recipient) %>%
select(address, messages.received)
df_summary <- merge(sent, received, all = TRUE) %>%
replace(is.na(.), 0)
We can use melt/dcast
library(reshape2)
dcast(melt(df1, id.var='message.id'), value~variable,
value.var='message.id', length)
Or using a wrapper recast
recast(df1, id.var='message.id', value~variable, length)
# value sender recipient
#1 A 3 0
#2 B 3 2
#3 C 0 2
#4 D 0 1
#5 Q 0 1
If we need to use dplyr/tidyr
library(dplyr)
library(tidyr)
gather(df1, messages, address, 2:3) %>%
group_by(messages, address) %>%
summarise(n=n()) %>%
spread(messages, n, fill=0)
# address sender recipient
# (chr) (dbl) (dbl)
#1 A 3 0
#2 B 3 2
#3 C 0 2
#4 D 0 1
#5 Q 0 1