I have a FASTA_16S.txt
file containing paragraphs of different lengths with a unique code (e.g. 16S317) at the top. After transfer into R
, I have a list with 413 members that looks like this:
[1]">16S317_V._rotiferianus_A\n
AAATTGAAGAGTTTGATCATGGCTCAG..."
[2]">16S318_Salmonella_bongori\n
AAATTGAAGAGTTTGATCATGGCTCAGATT..."
[3]">16S319_Escherichia_coli\n
TTGAAGAGTTTGATCATGGCTCAGATTG...
I need to substitute the existing codes with the new ones from a table Code_16S
:
Old New
1. 16S317 16S001
2. 16S318 16S307
3. 16S319 16S211
4. ... ...
Can anybody suggest a code that would identify an old code and substitute it with a new one?
Consider that we have the same codes in columns New and Old, so direct application of gsub
or replace
for the whole list did not work (after a substitution we have two paragraphs with the same code, so one of the next steps will change both of them).
Below there is my solution for the problem, but I don´t consider it as an optimal.
Instead of using lapply
, it may be easier with str_replace_all
library(stringr)
library(tibble)
FASTA_16S <- str_replace_all(FASTA_16S, deframe(Code_16S))
-output
FASTA_16S
[1] ">16S001_V._rotiferianus_A\n\nAAATTGAAGAGTTTGATCATGGCTCAG..."
[2] ">16S307_Salmonella_bongori\n\nAAATTGAAGAGTTTGATCATGGCTCAGATT..."
FASTA_16S <- c(">16S317_V._rotiferianus_A\n\nAAATTGAAGAGTTTGATCATGGCTCAG...",
">16S318_Salmonella_bongori\n\nAAATTGAAGAGTTTGATCATGGCTCAGATT..."
)
Code_16S <- structure(list(Old = c("16S317", "16S318", "16S319"), New = c("16S001",
"16S307", "16S211")), class = "data.frame", row.names = c("1.",
"2.", "3."))