I have a table with the following data:
ex = structure(list(A = c(482, 208, 227, 239, 783, 141), B = c(155,
69, 63, 65, 255, 25), C = c(64, 24, 28, 29, 134, 34), D = c(408,
180, 196, 207, 638, 104)), .Names = c("A", "B", "C", "D"), class = "data.frame", row.names = c("E",
"F", "G", "H", "I", "J"))
>ex
A B C D
E 482 155 64 408
F 208 69 24 180
G 227 63 28 196
H 239 65 29 207
I 783 255 134 638
J 141 25 34 104
I want to comput the chisq.test() for all pairs of rows for A vs B and for C vs D. This sounds pretty ambiguous to me so here is a concrete example:
A B C D
E 482 155 E 64 408
F 208 69 F 24 180
A B C D
E 482 155 E 64 408
G 227 63 G 28 196
... repeat for all pairs of E, F, G, H, I, and J
compute P values using chisq.tests() for each of these tables.
I have already done this, but it's output is in an annoying format. Basically I used combn(rownames(ex),2)
to get the pairs, then wrote an lapply that went through the result of combn, constructed the matrices from the table, then and gave me the chisq of the matrix.
tests = combn(rownames(ex), 2)
tests = apply(tests,2,list)
testResults = lapply(tests, function(cat){
test = unlist(cat)
AvsBm = matrix(c(ex[test[1],'A'],ex[test[2],'A'],ex[test[1],'B'],ex[test[2],'B']),nrow=2, ncol=2)
AvsBp = chisq.test(AvsBm)$p.value
CvsDm = matrix(c(ex[test[1],'C'],ex[test[2],'C'],ex[test[1],'D'],ex[test[2],'D']),nrow=2, ncol=2)
CvsDp = chisq.test(CvsDm)$p.value
a = c(test[1], test[2], AvsBp, CvsDp)
})
testResults = data.frame(do.call(rbind, testResults))
names(testResults) = c('Var1', 'Var2', 'AvsB', 'CvsD')
The results looked like this:
> testResults
Var1 Var2 AvsB CvsD
1 E F 0.918199692198075 0.608649272659481
2 E G 0.432572099792864 0.790459437339736
3 E H 0.358651246275942 0.723319426118104
4 E I 0.960564133271506 0.0896848347203047
5 E J 0.0144029906033956 0.0028292317888333
6 F G 0.424982446036333 0.932706790402674
7 F H 0.36295549985099 0.982958067120215
8 F I 0.968631154321032 0.0684734899837275
9 F J 0.0195800439529193 0.00302299304015596
10 G H 0.998659183486833 0.999999999999997
11 G I 0.354996420259763 0.102779771508206
12 G J 0.107030315095613 0.00460404677366423
13 H I 0.284826573788384 0.0801050087692166
14 H J 0.123057932646613 0.00332480813135708
15 I J 0.00951511015485216 0.0559833381301495
This works ok, but it feels like it should be a lot easier. I have to do a lot of reformatting of the table afterwards to get it into a pretty looking table. The ideal format would be two triangular tables one for A-vs-B and the other for C-vs-D.
Are there any built in functions that do this kind of thing?
Hope my question isn't too vague, Cheers.
You can use this instead:
within(as.data.frame(t(combn(rownames(ex), 2)), stringsAsFactors=FALSE), {
CvsDp <- mapply(function(i,j)chisq.test(ex[c(i,j),c("C","D")])$p.value,V1,V2)
AvsBp <- mapply(function(i,j)chisq.test(ex[c(i,j),c("A","B")])$p.value,V1,V2)
})
Result
V1 V2 AvsBp CvsDp
1 E F 0.91819969 0.608649273
2 E G 0.43257210 0.790459437
3 E H 0.35865125 0.723319426
4 E I 0.96056413 0.089684835
5 E J 0.01440299 0.002829232
6 F G 0.42498245 0.932706790
7 F H 0.36295550 0.982958067
8 F I 0.96863115 0.068473490
9 F J 0.01958004 0.003022993
10 G H 0.99865918 1.000000000
11 G I 0.35499642 0.102779772
12 G J 0.10703032 0.004604047
13 H I 0.28482657 0.080105009
14 H J 0.12305793 0.003324808
15 I J 0.00951511 0.055983338
EDIT: as a triangular table, given x
= the result above:
m <- matrix(nrow=nrow(ex), ncol=nrow(ex))
rownames(m) <- colnames(m) <- rownames(ex)
m[cbind(x$V1,x$V2)] <- x$AvsBp
Result
E F G H I J
E NA 0.9181997 0.4325721 0.3586512 0.9605641 0.01440299
F NA NA 0.4249824 0.3629555 0.9686312 0.01958004
G NA NA NA 0.9986592 0.3549964 0.10703032
H NA NA NA NA 0.2848266 0.12305793
I NA NA NA NA NA 0.00951511
J NA NA NA NA NA NA
For CvsDp
just replace it in the last line.