Search code examples
regexvimpattern-matchingdna-sequence

Finding the Pattern of a Gene Sequence Record Using Regular Expression


What is the regular expression that I can use to remove the numbers and the white spaces in the following record?

    1 cctataactt ggaatgtggg tggaggggtt catagttctc cctgagtgag acttgcctgc
   61 ttctctggcc cctggtcctg tcctgttctc cagcatggtg tgtctgaagc tccctggagg
  121 ctcctgcatg acagcgctga cagtgacact gatggtgctg agctccccac tggctttgtc
  181 tggggacacc cgacgtaagt gcacattgcg ggtgctgagc tactatgggg tggggaaaat
 0921 ggcctgaagt cccagcattg atggcagcgc ctcatcttca acttttgtgc tcccctttgc
10981 ctaaaccgta tggcctcccg tgcatctgta ttcaccctgt atgacaaaca cattacatta
11041 ttaaatgttt ctcaaagatg gagttaaa

I used the following expression that match the pattern for all the lines except the last one:

(\s+\d+\s)\w+(\s)\w+(\s)\w+(\s)\w+(\s)\w+(\s)\w+(\s+)(\d+)

Solution

  • You can use this

    :%s/\d\+\|\s\+//g
    

    to remove all numbers (\d\+) and whitespace (\s\+) from your buffer.