I have some data that I'm summarising as contingency tables. There are several entries in the data which are either missing or error values. Constructing the tables using table, as per the code below, is very useful as I can see by inspection how much of the data is missing or nonsense.
Knowing in advance which data items I want to retain, how can I select a subset of the data? For example, a small table with a portion of the data is:
my.tab <- table(sm.pos.grp, sm.neg.grp)
sm.neg.grp
sm.pos.grp zz Zz ZZ
00 0 9 1
zz 0 0 31
Zz 11 5 7
ZZ 0 77 211
I'm only interested in the zz
, ZZ
, and Zz
entries, so I can extract the relevant subset of the table like this:
my.tab[, 2:4]
sm.neg.grp
sm.pos.grp zz Zz ZZ
zz 0 1 0
Zz 0 10 7
ZZ 3 7 21
However, the the full data set is more complex:
full.pos.grp
full.neg.grp 00 zz zZ Zz ZZ ZTRUE TRUEz TRUEZ TRUEFalse
00 0 0 0 0 4 0 0 0 0
zz 5 126 140 151 258 15 0 0 0
zZ 3 123 547 0 616 0 0 0 0
Zz 2 120 0 513 572 0 0 2 0
ZZ 19 277 642 293 2286 0 5 28 0
TRUEz 0 0 0 1 3 0 0 0 0
TRUEZ 0 9 0 2 18 0 1 16 1
TRUEFalse 0 0 0 0 0 1 0 1 0
How can I subset the table by reference only to zz
, Zz
, zZ
and ZZ
? Converting to a data frame using as.data.frame(my.tab)
loses the table structure, and I can't seem to get the syntax right for tapply
(e.g. I tried things like tapply(sm.neg.grp, sm.pos.grp, sum)
without success). Any help much appreciated!
Here's the dput
commands for the tables:
> dput(my.tab)
structure(c(0L, 0L, 11L, 0L, 9L, 0L, 5L, 77L, 1L, 31L, 7L, 211L), .Dim = c(4L,
3L), .Dimnames = structure(list(sm.pos.grp = c("00", "zz", "Zz",
"ZZ"), sm.neg.grp = c("zz", "Zz", "ZZ")), .Names = c("sm.pos.grp",
"sm.neg.grp")), class = "table")
> dput(the.table)
structure(c(0L, 5L, 3L, 2L, 19L, 0L, 0L, 0L, 0L, 126L, 123L,
120L, 277L, 0L, 9L, 0L, 0L, 140L, 547L, 0L, 642L, 0L, 0L, 0L,
0L, 151L, 0L, 513L, 293L, 1L, 2L, 0L, 4L, 258L, 616L, 572L, 2286L,
3L, 18L, 0L, 0L, 15L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L,
5L, 0L, 1L, 0L, 0L, 0L, 0L, 2L, 28L, 0L, 16L, 1L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 0L), .Dim = 8:9, .Dimnames = structure(list(full.case.grp = c("00",
"zz", "zZ", "Zz", "ZZ", "TRUEz", "TRUEZ", "TRUEFalse"), full.ctrl.grp = c("00",
"zz", "zZ", "Zz", "ZZ", "ZTRUE", "TRUEz", "TRUEZ", "TRUEFalse")),
.Names = c("full.neg.grp", "full.pos.grp")), class = "table")
To subset your table by reference (i.e. by column and rownames) you can enter the names directly inside the squared brackets .
n <- c("zz", "Zz", "zZ", "ZZ")
my.tab[n, n]
full.pos.grp
full.neg.grp zz Zz zZ ZZ
zz 126 151 140 258
Zz 120 513 0 572
zZ 123 0 547 616
ZZ 277 293 642 2286