I would like to add a string from a text file (file1) to a second text file (file2). The strings from file1 should be be added sequentially to file2 after every greater than symbol >
. There are 9 greater than symbols in file2 and 9 strings in file1. File1 contains 9 different strings on lines 1-9, in column 1. Like this:
...
sctC_
sctJ_
sctV_
...
This is the while loop with sed I have tried to add the string into file2:
while IFS=$'\t' read f1 f2 ; do sed "s/^>/&$f2/" ; done < <(paste file2 file1)
However, only the first string gets added into file2 and the first line is stripped from file2:
MRNVLYAFLLTLYRGFCWSTVLLGMLPMAHAVTPPEWNKGAYAYSAEQTLLSTILIDFANSHGVELVMDN sctJ_
LKDTLVEAKIRAETPAAFLDRLALEHRFQWFVYNHTLYVSSQDTQASIRLEISPDAAPDLKQALSGIGLL sctV_
DPRFGWGELPEEGVVLVTGPQTYIDLIRNFSQQREKQDERRKVMIFPLRFASVSDRTLQYRDQRIVIPGV sctN_
ATILSELMDGQRPPPTGASGPTDAVPDSAMEAMRENTRAMLTRLATRNNPARSTDENGRLVLNGRISADV sctQ_
RNNALLVRDDEKRREEYQQLVEQIDVPQNLVNIDAIILDVDRTALSRLEANWQGTLGNVSAGSTMMMGRS sctR_
TLFVSDFKRFFADIQALEGEGTASIVANPSVLTLENQPAIVDFSRTAFITATGERVAQIQPITAGTSLQV sctS_
TPRVVGQDGPRSIQLVIDIEDGRVETGRDGEATGVKRGTVSTQALIGENRALVLGGFHVEESGDRDHRIP sctT_
LLGDIPWLGRLFTSTRHEVSRRERLFILTPHLIGDQTDPTRYVSAENRHQINDVMNRVSQRNGKHDLYSL sctU_
VENALRDLAGKQLPAGFQSETRGTRLSEVCRSQPGLVYDSNRYQWYGNGSIRLTVGVVRNSGTRIQRFDE
SVCGSNRTLAVAAWPKTTLAPGESTEVFLALQTLSSTAPPRRSLLASY
>sctC_12a_02741 hypothetical protein
MKTDLRALFLLLSLLLMGCGDPIELNRGLSENDANEVIAALGRYQIAAEKRVDKTGVTLIIDAKNMERAV
NILNAAGLPRQSRTNLGEVFQKSGVISTPLEERARYIYALSQEVEATLTQIDGVLVARVHVVLPERIAPG
EPVQPASAAVFIKYQPELEPDSVEPRIRRMVASSIPGLSGKNDKDLSIVFVPAEPYQDTIPVVTLGPFTL
TPQEMVRWQWTAGLMGALIIGLLAWRLGKPYMRQWQQNRADARQQR
>sctC_12a_02750 Invasion protein InvA
MNLVIIWLNRIALSAMQRSEVVGAVIVMSIVFMMIIPLPTSLIDVLIAFNICVSSLLIVLAMYLPKPLAF
STFPAVLLLTTMFRLALSISTTRQILLQQDGGHIVEAFGNYVVGGNLAVGLVIFLILTVVNFLVITKGSE
RVAEVAARFTLDAMPGKQMSIDSDLRAGLIEAHQARQRRDNLAKESQLFGAMDGAMKFVKGDAIAGLVIV
FINMIGGFAIGVLQHGMSAADAMHVYSVLTIGDGLIAQIPALLISLTAGMIITRVSAEGQPLDANIGREI
AEQLTSQPKAWIISALGMFGFALLPGMPSMVFMVISLASFSSGVFQLWRIKQQGILTHSQAEADNQPAEQ
NGHQDLRRFNPTRAYLLQFHPSMQGNPATLSLVQHIRRLRNRLVYQFGMTLPSFDIEFSDRLDEDEFQFG
VYEIPYVKATFVTERLAVHRSSFDQGELEDAIAGSTLRDEADWLWVSPMHPLLEQETCPRWAAGELILMR
MENAIHRSGAQFIGLQETKSILTWLESEQPELAQELQRIMPLSRFAGVLQRLASERIPLRSVRPIAEALI
EIGQHERDVHALTDYVRLALKAQICHQYSQQNTLHVWLLTPETEELLRDSLRQTQNETFFALTQDYAATL
LGQLRRAFPPSLPSTGQILVAQDLRTPLRVLLQEEFHHVPVLSFSELESHLSINVLGRFDLYEENTPFSA
>sctC_12a_02752 Type III secretion ATP synthase HrcN
MQTQAAIDFPLMTRWFQQQRRRLSDFAPVDLKGRIIGISGILLECSLPRARIGDLCLVERQDGSQVMAEV
VGFSPRNTFLSALGALDGIAQGAAVAPLYQPHCIQVSDRLFGSVLDGFGRALEDGGESAFVQPGELHGNA
QPVLGDAPPPTARPRIATPLPTGLRAIDGLLTLGQGQRVGIFAGAGCGKTTLLAELARNTPCDAIVFGLI
GERGRELREFLDHELDDDLRRRTVLVCSTSDRSSMERARAAFTATAIAEAYRAAGKQVLLIIDSLTRFAR
AQREIGLALGEPQGRGGLPPSVYTLLPRLVERAGQTQTGAITALYSVLIEQDSMNDPVADEVRSLIDGHI
VLTRRLAEQGHYPAIDVLASLSRTMSNVVDDGHNRHAGAVRRLMAAYKQVEMLIRLGEYQSGHDALTDSA
VNAQQDITRFLRQAMRDPMAYDDIQQQLAEVSAHAP
How can I get the string from file1 added recursively after the greater than symbol on file2?
Thanks,
JD
I'm not sure I understand exactly your requirements, but Perl should handle this easily. Read the first file into an array, then iterate over the second one and use the array to add the missing information.
perl -we 'push @s, scalar <> until eof;
chomp @s;
s/(?<=^>)/shift @s/e, print while <>;
' file1 file2
<>
is a shorter version of readline, it reads a line from file in scalar context.(?<=...)
is a lookbehind, in this case, it matches after the >
at a beginning of a line/e
modifier to the substitution operator s///
evaluates the replacement as code, shift extracts the first element from the array @s