Search code examples
rigraphplyrfrequency-analysischord-diagram

How often observations occur together in rows R


I have a dataframe that is comparable to the one below:

V1 V2 V3 V4 V5 V6 V7
 A B  C  D  NA NA NA
 A E  F  NA NA NA NA
 D A  C  B  F  E  NA
 A E  NA NA NA NA NA

Each row is a patient and each letter in the dataframe represents a specific diagnosis.

I want to find how often specific diagnoses occur together, e.g. How many times does diagnosis A occur with diagnosis E row-wise? (Three times).

I am hoping to produce a matrix like this:
  A B C D E F
A 0 2 2 2 3 2
B 2 0 2
C 2 2 0 etc etc
D 2
E 3
F 2

(I have not completely filled it out)

It is essentially an adjacency matrix except that the observations don't need to be directly adjacent, they just need to be on the same row.

From here I would then produce a chorddiagram.

Thank you for any help!


Solution

  • I thought it would be fun to construct this by hand. The algorithm is pretty simple. For each patient find which diagnoses co-occur and write that to an upper triangle matrix.

    set.seed(357)
    xy <- matrix(sample(LETTERS[1:15], size = 80, replace = TRUE), nrow = 8)
    
    > head(xy)
         [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
    [1,] "G"  "F"  "M"  "N"  "D"  "G"  "N"  "H"  "K"  "K"  
    [2,] "H"  "I"  "C"  "K"  "H"  "E"  "H"  "E"  "I"  "G"  
    [3,] "G"  "C"  "C"  "L"  "N"  "F"  "M"  "K"  "C"  "E"  
    [4,] "A"  "K"  "G"  "O"  "I"  "C"  "C"  "B"  "O"  "I"  
    [5,] "K"  "O"  "E"  "B"  "M"  "O"  "F"  "C"  "L"  "N"  
    [6,] "D"  "H"  "K"  "H"  "I"  "N"  "B"  "F"  "A"  "H" 
    
    # Find all unique diagnoses.
    all.diagnoses <- unique(as.vector(xy))
    all.diagnoses <- sort(as.character(all.diagnoses))
    
    # This is a way of creating an empty matrix.
    out <- matrix(rep(NA, length(all.diagnoses)^2), nrow = length(all.diagnoses),
                  dimnames = list(all.diagnoses, all.diagnoses))
    
    for (i in 1:nrow(xy)) {
      combinations <- combn(unique(xy[i, ]), m = 2, simplify = FALSE)
      for (j in 1:length(combinations)) {
        # Add occurrence of each combination to the corresponding combination.
        com <- sort(combinations[[j]])
        out[com[1], com[2]]  <- sum(out[com[1], com[2]], 1, na.rm = TRUE)
      }
    }
    
    > out
       A  B  C  D  E  F  G  H  I  J  K  L  M  N  O
    A NA  2  1  2 NA  1  1  1  2  1  3  1  1  2  2
    B NA NA  2  1  1  2  1  1  2 NA  3  1  1  2  2
    C NA NA NA NA  3  2  3  1  2 NA  4  2  2  2  2
    D NA NA NA NA NA  2  1  3  2  2  3  1  3  4  2
    E NA NA NA NA NA  2  2  1  1 NA  3  2  2  2  1
    F NA NA NA NA NA NA  2  2  1 NA  4  2  3  4  1
    G NA NA NA NA NA NA NA  2  2 NA  4  1  2  2  1
    H NA NA NA NA NA NA NA NA  3  1  3 NA  2  3  1
    I NA NA NA NA NA NA NA NA NA  1  3 NA  1  2  2
    J NA NA NA NA NA NA NA NA NA NA  1  1  2  2  2
    K NA NA NA NA NA NA NA NA NA NA NA  3  4  5  3
    L NA NA NA NA NA NA NA NA NA NA NA NA  3  3  2
    M NA NA NA NA NA NA NA NA NA NA NA NA NA  5  3
    N NA NA NA NA NA NA NA NA NA NA NA NA NA NA  3
    O NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA