Search code examples
rstringalignment

How to align a set of strings on a character, with a given number of characters on each side (replace missing characters with "x")?


I have a set of strings, each has a single character "X"

c("KGDDQSXQGGAPDAGQE", "TEEDSEEVXEQK", "LTXTSGETTQTHTEPTGDSK", "IXTHNSEVEEDDMDK", "SXENPEEDEDQRNPAK", "XTAEHEAAQQDLQSK", "ATVIXHGETLRRTK", "XAVAREESGKPGAHVTVK", "YHTINGHNAEVXK", "XAAEDDEDDDVDTK")

I would like to get a character vector with each element having 11 characters, the center of the string is "X" and there is 5 characters from the string on each side. If there are no 5 characters on one of the sides, then "x" is added instead.

E.g.
"KGDDQSXQGGAPDAGQE", becomes "GDDQSXQGGAP"

"TEEDSEEVXEQK", becomes "DSEEVXEQKxx"

"LTXTSGETTQTHTEPTGDSK", becomes "xxxLTXTSGET"


Solution

  • One more approach, using stringr:

    library(stringr)
    
    vec <- c("KGDDQSXQGGAPDAGQE", "TEEDSEEVXEQK", "LTXTSGETTQTHTEPTGDSK", "IXTHNSEVEEDDMDK", "SXENPEEDEDQRNPAK", "XTAEHEAAQQDLQSK", "ATVIXHGETLRRTK", "XAVAREESGKPGAHVTVK", "YHTINGHNAEVXK", "XAAEDDEDDDVDTK")
    
    vec %>%
      str_pad(width = sapply(vec, nchar) + 10, 
              side = "both", pad = "x") %>%
      str_match(".{5}X.{5}")
    #>       [,1]         
    #>  [1,] "GDDQSXQGGAP"
    #>  [2,] "DSEEVXEQKxx"
    #>  [3,] "xxxLTXTSGET"
    #>  [4,] "xxxxIXTHNSE"
    #>  [5,] "xxxxSXENPEE"
    #>  [6,] "xxxxxXTAEHE"
    #>  [7,] "xATVIXHGETL"
    #>  [8,] "xxxxxXAVARE"
    #>  [9,] "HNAEVXKxxxx"
    #> [10,] "xxxxxXAAEDD"
    

    Created on 2020-04-26 by the reprex package (v0.3.0)