I know this has been asked before but I cannot find a solution that is working - for some reason when I try any of the other solutions posted in stackoverflow they will simply NOT work
I have a directory that has 900+ fasta files, they all finish with ".faa" some of the names are:
TLLD001.faa TLLD002.faa TLLD003.faa TLLD004.faa TLLD005.faa
etc etc
within each file the headers of the fasta are:
>scaffold4567
WRVLSTSFNGIKYEQSAAFAMIPSTT
>scaffold0034
EQSAAFAMIPSTTSISWRVLSTSFNGIKYEQ
or
>NODE_212
WRVLSTSFNGIKYEQSAAFAMIPSTT
>NODE_86667
EQSAAFAMIPSTTSISWRVLSTSFNGIKYEQ
etc etc
I wanna go through all the files and replace the header by adding the filename for example, TLLD001.faa
>scaffold4567
WRVLSTSFNGIKYEQSAAFAMIPSTT
>scaffold0034
EQSAAFAMIPSTTSISWRVLSTSFNGIKYEQ
>scaffold7667
WRVLSTSFNGIKYEQSAAFAMIPSTT
>scaffold6778
EQSAAFAMIPSTTSISWRVLSTSFNGIKYEQ
should become
>TLLD001_scaffold4567
WRVLSTSFNGIKYEQSAAFAMIPSTT
>TLLD001_scaffold0034
EQSAAFAMIPSTTSISWRVLSTSFNGIKYEQ
>TLLD001_scaffold7667
WRVLSTSFNGIKYEQSAAFAMIPSTT
>TLLD001_scaffold6778
EQSAAFAMIPSTTSISWRVLSTSFNGIKYEQ
this is working nicely but i have to specify a single file every time
$awk '/>/{sub(">","&"FILENAME"_");sub(/\.faa/,x)}1' TLLD001.faa
so not my cup of tea
this seems to have worked in 3-4 files i did as a test but it will not work in my 900+ files directory -takes forever-
for i in *.faa; do
sed -i "s/^>/>${i}_/g" *.faa
done
and the following are not working at all:
$for file in *.fasta; do awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < $file > "`basename $file .fasta`_single-line.fasta"; done
and
$for file in *.faa; do awk '/>/{sub(">","&"${file}"_");sub(/\.faa/,x)}1' < $file > "`basename $file .faa`_mod.faa"; done
and I don't know why! any help and explanation of how to use this almighty but cryptic "awk" will be highly appreciated
thanks P
The sed solution is the way to go but you repeated the glob in the command!
Instead of
for f in *.faa; do sed -i "s/^>/>${f%.faa}/g" *.faa; done
Use the ${f} variable in the sed command, otherwise it is expanded for the sed command again!
for f in *.faa; do sed -i "s/^>/>${f%.faa}/g" "${f}"; done
I also made us of some bash variable substituion to simply remove .faa from the file.