I have a large data set and need to compare all combinations between columns. The desired output would be a matrix for each column combination.
The data frame to start with could look like Data
:
set.seed(1)
Data <- data.frame(
ID = (1:100),
A = sample(1:10,10),
B = sample(1:20,100,replace = T),
C = sample(1:5,100,replace = T),
D = sample(1:20,100,replace = T)
)
Data
and I want to know how often appears the same combination within two columns.
(e.g. how often is a 1
in A
in the same row with a 4
in B
)
for all combinations between column A to D?
I was using:
require(dplyr)
X1 <- ddply(Data,.(A,B),transmute, count=length(ID))
and get a object like:
A B count
1 1 3 1
2 1 7 1
3 1 9 2
4 1 9 2
5 1 12 1
6 1 13 1
7 1 14 1
8 1 16 1
9 1 18 1
10 1 20 1
11 2 2 1
12 2 6 1
13 2 10 1
14 2 11 1
But how can I get the count
result in a matrix format?
The output for A vs. B cold look like:
B1 B2 B3 B4 B5 B6
A1 1 1 2 1 1 ...
A2 1 1 2 1 1
A3 2 1 1 1 1
A4 2 1 1 1 1
A5 1 1 2 1 2
A6 1 1 2 1 2
A7 1 3 1 1 1
A8 1 3 1 1 2
A9 1 3 2 1 2
A10 1 1 2 1 1
In the best case the result would be a `list` containing the objects `AB` `AC` ...`CD` as matrix.
You can do this :
library(tidyverse)
X2 <-X1 %>% group_by(A,B) %>%
summarise(count=max(count)) %>% #use max instead of sum
ungroup() %>%
mutate(A=paste0("A",A),B=paste0("B",B)) %>%
spread(B,count,fill=0)
X3 <- as.matrix(X2[,2:ncol(X2)])
rownames(X3) <- as.character(X2$A)