Search code examples
rmathmatrixarithmetic-expressionschi-squared

Can I use arithmetic when creating a matrix in R?


I need some help creating a matrix. I have a large dataset with multiple groups. Each group is sorted into cases and non cases.

For ex.

Group Cases Noncases
GroupA 4 7
GroupB 9 4
GroupC 10 3

I want to create a matrix which will compare one group to the sum of the other groups.

For instance:

Disease Category GroupA NotGroupA
Case 4 19
Noncase 7 7

The goal is to set up a matrix which will allow me to run a chisquare test and/or a Fisher's exact test (depending on sample size).

I have tried the following code to extrapolate values from my dataframe into a matrix:

GroupA <- as.table(matrix(c(df[1,3], df[1,4], (sum(df$group_cases)-df$group_cases[1])), (sum(df$Noncases)-df$Noncases[1])), nrow=2, ncol=2,
                dimnames=list(Group= c("A", "Other"),
                              Case = c(1, 0)))

However, I get the following error:

Warning message:
In matrix(c(df[1, 3], df[1, 4], (sum(df$group_cases) -  :
  data length [3] is not a sub-multiple or multiple of the number of rows [329]

It outputs a 329 row list instead of a 2 by 2 matrix.

Because I have many groups, I want R to calculate the values for me when constructing the matrix. I don't want to calculate the "NotGroup_" column separately, as that makes room for human error.

How would you all recommend constructing this matrix, and is it possible to have R calculate the sums of columns/subtract values while creating a matrix?

Thank you for your help!


Solution

  • Set up example:

    dd <- data.frame(Group = LETTERS[1:3], Cases = c(4, 9, 10),
                     Noncases = c(7,4,3))
    

    Function:

    mktab <- function(focal, data) {
        ## subset rows according to whether $Group == focal or not
        ## subset cols according to "Cases"/"Noncases"
        ## sum() the not-focal elements
        matrix(c(data[data$Group==focal, "Cases"],
                 sum(data[data$Group!=focal, "Cases"]),
                 data[data$Group==focal, "Noncases"],
                 sum(data[data$Group!=focal, "Noncases"])
                 ),
               nrow = 2,
               byrow=TRUE,
               dimnames = list(c("Case", "Noncase"),
                               c(focal, paste0("not_", focal)))
               )
    }
    
    mktab("A", dd)
                 
    

    Results:

            A not_A
    Case    4    19
    Noncase 7     7