I have the following dataframe:
df <- as.data.frame( FID = c("1234", "1234", "4567", "4567", "2345",
"2345"), genotype_column = c("chr26_1234_A_G", "chr26_1234_A_G",
"chr26_1234_A_G", "chr26_1234_A_G", "chr26_1234_A_G",
"chr26_1234_A_G"), dataset = c("type1", "type2", "type1", "type2",
"type1", "type2"), genotype_type = c("AA", "Aa", "AA", "aa", "AA",
"AA") )
I want to create from this dataframe a matrix with the following aspect:
matrix(0, nrow = 3, ncol = 3,
dimnames = list(c("AA_type1", "Aa_type1", "aa_type1"),
c("AA_type2", "Aa_type2", "aa_type2")))
AA_type1 Aa_type1 aa_type1
AA_type2 999 23 4
Aa_type2 87 12 4
aa_type2 13 10 1
this matrix should store how many individuals in the FID column of the dataframe show AA, Aa, aa according to the dataset column. Therefore, over the diagonal of the matrix, there will be common values in terms of AA, Aa and aa in-between type1 and type2, while off the diagonal there will be not common values in between type 1 and type 2 in terms of AA, Aa and aa. How to do it in R? Is there a function that could allow to do it? Thanks!
Using reshape
(for reshaping long to wide) and xtabs
(for tabulation)
xtabs(
~.,
data=reshape(
df,
timevar = "dataset",
idvar = "FID",
drop = "genotype_column",
direction = "wide"
)[,-1]
)
genotype_type.type2
genotype_type.type1 aa Aa AA
AA 1 1 1