Search code examples
bashfor-loopincrement

for loop writes only last result to file


Hi there I've been playing a bit with for loops in BASH to edit a FASTA file.

The file has 24 headers that start with '>' character, as follow:

>CP068277.2
>CP068276.2
>CP068275.2
>CP068274.2
>CP068273.2
>CP068272.2
>CP068271.2
>CP068270.2
>CP068269.2
>CP068268.2
>CP068267.2
>CP068266.2
>CP068265.2
>CP068264.2
>CP068263.2
>CP068262.2
>CP068261.2
>CP068260.2
>CP068259.2
>CP068258.2
>CP068257.2
>CP068256.2
>CP068255.2
>CP086569.2

These are actually chromosomes and I need them to be in the form of >chm1, >chm2, etc.

I wrote the following for loop:

for ((c=1; c<=24; c++)); 
  do 
    sed 's/>/>chr'"$c"' /' CHM13v2.0_no-mito.fna > CHM13v2.0_no-mito_trial.fna;
done

The output is, however, showing only >chm24 without accounting for the count operation (see below)..., anyone has any idea why?

>chr24 CP068277.2
>chr24 CP068276.2
>chr24 CP068275.2
>chr24 CP068274.2
>chr24 CP068273.2
>chr24 CP068272.2
>chr24 CP068271.2
>chr24 CP068270.2
>chr24 CP068269.2
>chr24 CP068268.2
>chr24 CP068267.2
>chr24 CP068266.2
>chr24 CP068265.2
>chr24 CP068264.2
>chr24 CP068263.2
>chr24 CP068262.2
>chr24 CP068261.2
>chr24 CP068260.2
>chr24 CP068259.2
>chr24 CP068258.2
>chr24 CP068257.2
>chr24 CP068256.2
>chr24 CP068255.2
>chr24 CP086569.2

P.S. no worries for the sequences following the >chm24, I have a way to remove them with sed; nonetheless, it would be nice to have everything done in one step

Thanks in advance!


Solution

  • Your loop is overwriting the output file on each iteration, the syntax for what you're trying to do would be:

    for ((c=1; c<=24; c++)); 
      do 
        sed 's/>/>chr'"$c"' /' CHM13v2.0_no-mito.fna
    done  > CHM13v2.0_no-mito_trial.fna
    

    but this would be orders of magnitude more efficient and doesn't hard-code how many header lines you hope the file contains:

    awk 'sub(/>/,""){$0=">chr" (++c) " " $0} 1' CHM13v2.0_no-mito.fna > CHM13v2.0_no-mito_trial.fna