Search code examples
rdna-sequence

Group a DNA sequence in codons


I have generated a random DNA sequence

base <- c("A","G","U")
seq <- sample(base, 15, replace = T)
[1] "A" "G" "A" "U" "A" "G" "U" "A" "U" "A" "G" "U" "G" "U" "G"

How can I group the resulting sequence to codons (set of three nucleotides) in order to look for the stop codons? I need something like these:

new_seq <- c("AGA","UAG", "UAU", "AGU", "GUG")

Solution

  • Convert to 3 column matrix, then paste:

    base <- c("A","G","U")
    set.seed(1); x <- sample(base, 15, replace = T)
    x
    # [1] "A" "U" "A" "G" "A" "U" "U" "G" "G" "U" "U" "A" "A" "A" "G"
    
    do.call(paste0, as.data.frame(matrix(x, ncol = 3, byrow = TRUE)))
    # [1] "AUA" "GAU" "UGG" "UUA" "AAG"