Search code examples
rmatrix

How can I find matrices with zero determinants in the data set? - Too many equal marginals, cannot compute


I'm trying to calculate inter-rater agreement (Stuart-Maxwell Test). I've come to a certain point. I also detected the problem. However, I ran into a problem somewhere. How can I find the pattern in the data set? In order to continue the analysis, I need to remove the problematic matrices from the data set.

# 2 raters, 4 different response category and 4 subjects
R1 <- as.data.frame(expand.grid(rep(list(0:3), 4)))
R2 <- as.data.frame(expand.grid(rep(list(0:3), 4)))

# nrow(R1)*nrow(R2) = 256
pattern <- expand.grid(1:256, 1:256)
allPossible <- vector(mode = "list", length = nrow(pattern))
for(i in 1:nrow(pattern)){
  allPossible[[i]] <- rbind(R1[pattern[i, 1], ], R2[pattern[i, 2], ])
}

# 256*256 = 65536 possible rating
allPossible

# cross-tabulation
levs <- c(0:2)
list1 <- list()
list2 <- list()
list_A <- list()

for (i in 1:65536) {
  list1[[i]] <- factor(allPossible[[i]][1, ], levels = levs)
  list2[[i]] <- factor(allPossible[[i]][2, ], levels = levs)
  list_A[[i]] <- xtabs(~list1[[i]] + list2[[i]])
}

list_A

# 256 out of 65536 are completely same ratings. I have to delete them. (eg. check allPossible[1], allPossible[258])
list_B <- list_A[-seq(from = 1, to = 65536, by = 257)]

# Stuart-Maxwell Test
install.packages("DescTools")
library(DescTools)

# Now, I have 65280 different matrices
outputSM <- list()

for (i in 1:65280) {
  outputSM[[i]] <- StuartMaxwellTest(list_B[[i]])

}

outputSM

# it didn't work. Because, there are still some 3x3 matrices with the determinant 0 (zero).

# eg. problematic 3x3 matrices
StuartMaxwellTest(list_A[[260]])
StuartMaxwellTest(list_A[[820]])

# yes I found the source of some of problematic matrices 
allPossible[260]
allPossible[820]


Solution

  • The problem isn't necessarily just determinants equal to zero. There are some matrices where the determinant is equal to zero, but the test is still valid. Take, for example, the second element of list_A:

    > det(as.matrix(list_A[[2]]))
    [1] 0
    > StuartMaxwellTest(list_A[[2]])
    
        Stuart-Maxwell test
    
    data:  list_A[[2]]
    chi-squared = 1, df = 2, p-value = 0.6065
    

    The check that the test is running can be seen on lines 25-31 of the source code.

      rowsums <- rowSums(x)
      colsums <- colSums(x)
      equalsums <- diag(x) == rowsums & diag(x) == colsums
      if (any(equalsums)) {
        x <- x[!equalsums, !equalsums]
        if (dim(x)[1] < 2) 
          stop("Too many equal marginals, cannot compute")
    

    To figure out ahead of time which ones are valid, you could write a little function that just did that test:

    validSM <- function(x){
      rowsums <- rowSums(x)
      colsums <- colSums(x)
      equalsums <- diag(x) == rowsums & diag(x) == colsums
      if (any(equalsums)) {
        x <- x[!equalsums, !equalsums]
      }
      dim(x)[1] >= 2
      
    }
    

    Then, you could apply that function to your list of matrices:

    > valid <- sapply(list_A, validSM)
    > table(valid)
    valid
    FALSE  TRUE 
    10000 55536 
    

    This says that there are 10000 other matrices that will not pass the test. You could then just use the valid ones:

    list_A_small <- list_A[which(valid)]
    pattern_small <- pattern[which(valid), ]
    
    outputSM_small <- list()
    for (i in seq_along(list_A_small)) {
      outputSM_small[[i]] <- StuartMaxwellTest(list_A_small[[i]])
    }
    

    Another, perhaps easier option, would be to just catch the errors using tryCatch() and return a missing value if it's not valid. For example:

    outputSM <- list()
    for (i in seq_along(list_A)) {
      outputSM[[i]] <- tryCatch(
        StuartMaxwellTest(list_A[[i]]), 
        error = function(x)NA)
    }