I am working with a data frame like this, with the ID
column indicating a specific publication:
ID AuthorA AuthorB AuthorC
1 Chris Lee Jill
2 Jill Tom Lee
3 Tom Chris Lee
4 Lee Jill NA
5 Jill Chris NA
I would like to generate a source
, target
, and count
column for a social network analysis. In other words, count the number of times two authors appear on the same publication. The data frame I am working with, however, has 18 author columns. This should be the final output:
Source Target Count
Chris Lee 2
Chris Jill 2
Lee Jill 3
Jill Tom 1
Tom Lee 2
Tom Chris 1
For every row you can create all combination of names and count their frequency with table
.
result <- stack(table(unlist(apply(df[-1], 1, function(x) {
vec <- na.omit(x)
if(length(vec) < 2) return(NULL)
combn(vec, 2, function(y) paste0(sort(y), collapse = '-'))
}))))[2:1]
result
# ind values
#1 Chris-Jill 2
#2 Chris-Lee 2
#3 Chris-Tom 1
#4 Jill-Lee 3
#5 Jill-Tom 1
#6 Lee-Tom 2
To get them in separate columns you can use separate
:
tidyr::separate(result, ind, c('Source', 'Target'), sep = '-')
# Source Target values
#2 Chris Jill 2
#3 Chris Lee 2
#4 Chris Tom 1
#6 Jill Lee 3
#7 Jill Tom 1
#9 Lee Tom 2