Search code examples
rdataframematrixdplyr

how to convert a dataframe into a matrix summarising certaing columns in R?


I have the following dataframe:

df <- as.data.frame(   FID = c("1234", "1234", "4567", "4567", "2345",
"2345"),   genotype_column = c("chr26_1234_A_G", "chr26_1234_A_G",
"chr26_1234_A_G", "chr26_1234_A_G", "chr26_1234_A_G",
"chr26_1234_A_G"),   dataset = c("type1", "type2", "type1", "type2",
"type1", "type2"),   genotype_type = c("AA", "Aa", "AA", "aa", "AA",
"AA") )

I want to create from this dataframe a matrix with the following aspect:

matrix(0, nrow = 3, ncol = 3,
                        dimnames = list(c("AA_type1", "Aa_type1", "aa_type1"),
                                        c("AA_type2", "Aa_type2", "aa_type2")))
AA_type1 Aa_type1 aa_type1
AA_type2  999  23  4
Aa_type2  87  12   4
aa_type2  13  10  1

this matrix should store how many individuals in the FID column of the dataframe show AA, Aa, aa according to the dataset column. Therefore, over the diagonal of the matrix, there will be common values in terms of AA, Aa and aa in-between type1 and type2, while off the diagonal there will be not common values in between type 1 and type 2 in terms of AA, Aa and aa. How to do it in R? Is there a function that could allow to do it? Thanks!


Solution

  • Using reshape (for reshaping long to wide) and xtabs (for tabulation)

    xtabs(
      ~.,
      data=reshape(
        df,
        timevar = "dataset",
        idvar = "FID",
        drop = "genotype_column",
        direction = "wide"
      )[,-1]
    )
    
                       genotype_type.type2
    genotype_type.type1 aa Aa AA
                     AA  1  1  1