Search code examples
rdataframegenetics

Convert data frame of snp genotypes in numeric matrix


snp1 <- c("AA", "AT", "AA", "TT", "AA", "AT", "AA", "AA", "AA", "AT")
snp2 <- c("GG", "GC", "GG", "CC", "CC", "GC", "GG", "GG", "GG", "GC")
df1 <- data.frame(snp1, snp2)

num1 <- c(1, 2, 1, 3, 1, 2, 1, 1, 1, 2)
num2 <- c(1, 2, 1, 3, 3, 2, 1, 1, 1, 2)
df2 <- data.frame(num1, num2)

This is done in R. I have an object df1, which I want to convert to df2. For each column in df1, the most common value is converted to 1, the second most common value to 2, etcetera. How do I do this efficiently?


Solution

  • Variation on a theme:

    lapply(df1, function(x) match(x, levels(x)[order(-table(x))]) )
    #$snp1
    # [1] 1 2 1 3 1 2 1 1 1 2
    #
    #$snp2
    # [1] 1 2 1 3 3 2 1 1 1 2