Problem description
I have a list of strings of equal size like this:
example.list <- c('BBCD','ABBC','ADDB','ACBB')
Then I want to obtain the frequency of occurence of specific letters at specific positions. First I convert this to a matrix:
A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 D4
[1,] 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 1
[2,] 1 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0
[3,] 1 0 0 0 0 0 0 1 0 0 0 1 0 1 0 0
[4,] 1 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0
[5,] 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1
Now I want to obtain the frequency of each column combination. Some examples:
A1 : B2 = 2
A1 : B3 = 3
B1 : B2 = 1
.. etc
Split the strings into a list, s
, of vectors of single characters. Set n
to their common length and create a matrix v
from s
whose columns contain elements such as B1
, etc. Then use xtabs
to create counts giving m1
and combn
to get counts of pairs in m2
.
s <- strsplit(example.list, "")
n <- lengths(s)[1]
v <- sapply(s, paste0, 1:n)
m1 <- xtabs(~., data.frame(colv = c(col(v)), v = c(v)))
m2 <- combn(1:ncol(m1), 2, function(ix) sum(m1[, ix[1]] * m1[, ix[2]]))
names(m2) <- combn(colnames(m1), 2, paste, collapse = "")
giving:
> m1
v
colv A1 B1 B2 B3 B4 C2 C3 C4 D2 D3 D4
1 0 1 1 0 0 0 1 0 0 0 1
2 1 0 1 1 0 0 0 1 0 0 0
3 1 0 0 0 1 0 0 0 1 1 0
4 1 0 0 1 1 1 0 0 0 0 0
> m2
A1B1 A1B2 A1B3 A1B4 A1C2 A1C3 A1C4 A1D2 A1D3 A1D4 B1B2 B1B3 B1B4 B1C2 B1C3 B1C4
0 1 2 2 1 0 1 1 1 0 1 0 0 0 1 0
B1D2 B1D3 B1D4 B2B3 B2B4 B2C2 B2C3 B2C4 B2D2 B2D3 B2D4 B3B4 B3C2 B3C3 B3C4 B3D2
0 0 1 1 0 0 1 1 0 0 1 1 1 0 1 0
B3D3 B3D4 B4C2 B4C3 B4C4 B4D2 B4D3 B4D4 C2C3 C2C4 C2D2 C2D3 C2D4 C3C4 C3D2 C3D3
0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0
C3D4 C4D2 C4D3 C4D4 D2D3 D2D4 D3D4
1 0 0 0 1 0 0