Search code examples
rnetwork-analysisedge-list

How to create pairs from a single column counting the occurrence in R?


So I'm working on creating an edges file for a social network analysis based on IMDb data. And I've run into a problem and I can't figure out how to fix it as I'm new to R.

Assuming I have the following dataframe:

movieID <- c('A', 'A','A', 'B','B', 'C','C', 'C')
crewID <- c('Z', 'Y', 'X', 'Z','V','V', 'X', 'Y')
rating <- c('7.3','7.3', '7.3', '2.1', '2.1', '9.0','9.0', '9.0')
df <- data.frame(movieID, crewID, rating)
movieID CrewID Rating
A Z 7.3
A Y 7.3
A X 7.3
B Z 2.1
B V 2.1
C V 9.0
C X 9.0
C Y 9.0

I am trying to build unique pairs of CrewIDs within a movie with a weight that equals the occurrence of that pair, meaning how often these two crew members have worked on a movie together. So basically I want a dataframe like the following as a result:

CrewID1 CrewID2 weight (not a col but explanation)
Z Y 1 together once in movie A
Z X 1 together once in movie A
Y X 2 together twice in movies A and C
Z V 1 together once in movie B
V X 1 together once in movie C
V Y 1 together once in movie C

The pairs (Z,Y) and (Y,Z) are equal to each other as I don't care about direction.

I found the following StackOverflow thread on a similar issue: How to create pairs from a single column based on order of occurrence in R?

However in my case this skips the combination (V,Y) and (X,Z) and the count for (X,Y) is still 1 and I can't figure out how to fix it.


Solution

  • m <- crossprod(table(df[-3]))
    m[upper.tri(m, diag = TRUE)] <-0
    subset(as.data.frame.table(m), Freq > 0)
    
       CrewID CrewID.1 Freq
    2       X        V    1
    3       Y        V    1
    4       Z        V    1
    7       Y        X    2
    8       Z        X    1
    12      Z        Y    1