Search code examples
gnu-parallel

Combining replacement strings and regular expressions in GNU Parallel


I have a list of file paths of the format:

/data/nicotine_sensi/bam/9-2_box_1_S23_starAligned.sortedByCoord.out.bam
/data/nicotine_sensi/bam/9-2_box_3_S101_starAligned.sortedByCoord.out.bam
/data/nicotine_sensi/bam/9-3_box_1_S24_starAligned.sortedByCoord.out.bam
/data/nicotine_sensi/bam/9-3_box_3_S102_starAligned.sortedByCoord.out.bam

I want to input into a gnu parallel command so that both the predefined replacement strings and a perl or --plus replacement string operate at the same time, but I couldn't find a solution in the tutorials. Ideally, {/...} and {%_starAligned} would both work together to produce:

9-2_box_1_S23
9-2_box_3_S101
9-3_box_1_S24
9-3_box_3_S102

however, the closest I get is:

parallel --rpl '{..} s:/data/nicotine_sensi/bam/::;s:_starAligned.sortedByCoord.out.bam::' \
  echo {..} ::: $(ls $bam_dir/*.bam)

which is messy and not very portable for other directories.


Solution

  • The definition of {/...} is:

    s:.*/::; s:\.[^/.]+$::; s:\.[^/.]+$::; s:\.[^/.]+$::;
    

    The definition of {%(.*)} is:

    s/$$1$//;
    

    So combined you could do:

    echo /data/nicotine_sensi/bam/9-3_box_1_S24_starAligned.sortedByCoord.out.bam |
      parallel --rpl '{¤([^}]+?)} s:.*/::; s:\.[^/.]+$::; s:\.[^/.]+$::; s:\.[^/.]+$::; s/$$1$//;' echo {¤_starAligned}
    

    If you know you will always remove _something then:

    echo /data/nicotine_sensi/bam/9-3_box_1_S24_starAligned.sortedByCoord.out.bam |
      parallel --rpl '{¤} s:.*/::; s:\.[^/.]+$::; s:\.[^/.]+$::; s:\.[^/.]+$::; s/_[^_]+$//;' echo {¤}
    

    If you will be using this a lot then putting it in a profile will probably be a good idea.