Search code examples
rsubsetcontingency

Getting Conditional Subset of Contingency Table


I have some data that I'm summarising as contingency tables. There are several entries in the data which are either missing or error values. Constructing the tables using table, as per the code below, is very useful as I can see by inspection how much of the data is missing or nonsense.

Knowing in advance which data items I want to retain, how can I select a subset of the data? For example, a small table with a portion of the data is:

my.tab <- table(sm.pos.grp, sm.neg.grp)

      sm.neg.grp
sm.pos.grp  zz  Zz  ZZ
        00   0   9   1
        zz   0   0  31
        Zz  11   5   7
        ZZ   0  77 211

I'm only interested in the zz, ZZ, and Zz entries, so I can extract the relevant subset of the table like this:

my.tab[, 2:4]

      sm.neg.grp
sm.pos.grp zz Zz ZZ
        zz  0  1  0
        Zz  0 10  7
        ZZ  3  7 21

However, the the full data set is more complex:

        full.pos.grp
full.neg.grp   00   zz   zZ   Zz   ZZ ZTRUE TRUEz TRUEZ TRUEFalse
   00           0    0    0    0    4     0     0     0         0
   zz           5  126  140  151  258    15     0     0         0
   zZ           3  123  547    0  616     0     0     0         0
   Zz           2  120    0  513  572     0     0     2         0
   ZZ          19  277  642  293 2286     0     5    28         0
   TRUEz        0    0    0    1    3     0     0     0         0
   TRUEZ        0    9    0    2   18     0     1    16         1
   TRUEFalse    0    0    0    0    0     1     0     1         0

How can I subset the table by reference only to zz, Zz, zZ and ZZ? Converting to a data frame using as.data.frame(my.tab) loses the table structure, and I can't seem to get the syntax right for tapply (e.g. I tried things like tapply(sm.neg.grp, sm.pos.grp, sum) without success). Any help much appreciated!

Here's the dput commands for the tables:

> dput(my.tab)
structure(c(0L, 0L, 11L, 0L, 9L, 0L, 5L, 77L, 1L, 31L, 7L, 211L), .Dim = c(4L, 
3L), .Dimnames = structure(list(sm.pos.grp = c("00", "zz", "Zz", 
"ZZ"), sm.neg.grp = c("zz", "Zz", "ZZ")), .Names = c("sm.pos.grp", 
"sm.neg.grp")), class = "table")  

> dput(the.table)
structure(c(0L, 5L, 3L, 2L, 19L, 0L, 0L, 0L, 0L, 126L, 123L, 
120L, 277L, 0L, 9L, 0L, 0L, 140L, 547L, 0L, 642L, 0L, 0L, 0L, 
0L, 151L, 0L, 513L, 293L, 1L, 2L, 0L, 4L, 258L, 616L, 572L, 2286L, 
3L, 18L, 0L, 0L, 15L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 
5L, 0L, 1L, 0L, 0L, 0L, 0L, 2L, 28L, 0L, 16L, 1L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 0L), .Dim = 8:9, .Dimnames = structure(list(full.case.grp = c("00", 
"zz", "zZ", "Zz", "ZZ", "TRUEz", "TRUEZ", "TRUEFalse"), full.ctrl.grp = c("00", 
"zz", "zZ", "Zz", "ZZ", "ZTRUE", "TRUEz", "TRUEZ", "TRUEFalse")), 
.Names = c("full.neg.grp", "full.pos.grp")), class = "table")

Solution

  • To subset your table by reference (i.e. by column and rownames) you can enter the names directly inside the squared brackets .

    n <- c("zz", "Zz", "zZ", "ZZ")
    my.tab[n, n]
    
                full.pos.grp
    full.neg.grp  zz  Zz  zZ   ZZ
              zz 126 151 140  258
              Zz 120 513   0  572
              zZ 123   0 547  616
              ZZ 277 293 642 2286