Search code examples
rcounthyperlinkfrequencypairwise

How do I count the number of times any two given values occur together in a row in R?


I am working with a data frame like this, with the ID column indicating a specific publication:

ID AuthorA AuthorB AuthorC
1   Chris   Lee     Jill
2   Jill    Tom     Lee 
3   Tom     Chris   Lee
4   Lee     Jill    NA
5   Jill    Chris   NA

I would like to generate a source, target, and count column for a social network analysis. In other words, count the number of times two authors appear on the same publication. The data frame I am working with, however, has 18 author columns. This should be the final output:

Source Target Count
Chris   Lee     2
Chris   Jill    2
Lee     Jill    3
Jill    Tom     1
Tom     Lee     2
Tom     Chris   1

Solution

  • For every row you can create all combination of names and count their frequency with table.

    result <- stack(table(unlist(apply(df[-1], 1, function(x) {
                     vec <- na.omit(x)
                     if(length(vec) < 2) return(NULL)
                      combn(vec, 2, function(y) paste0(sort(y), collapse = '-'))
                }))))[2:1]
    result
    #         ind values
    #1 Chris-Jill      2
    #2  Chris-Lee      2
    #3  Chris-Tom      1
    #4   Jill-Lee      3
    #5   Jill-Tom      1
    #6    Lee-Tom      2
    

    To get them in separate columns you can use separate :

    tidyr::separate(result, ind, c('Source', 'Target'), sep = '-')
    
    #  Source Target values
    #2  Chris   Jill      2
    #3  Chris    Lee      2
    #4  Chris    Tom      1
    #6   Jill    Lee      3
    #7   Jill    Tom      1
    #9    Lee    Tom      2