Search code examples
bashsed

remove all occurences of dot only between two strings and not in the rest of the file


apologies if this has been asked before - I could not find an answer to my question though

assuming there is the following file

touch test2

echo "refH.fasta = ref/GCA_013924565.1_ASM1392456v1_genomic.fasta" >> test2


echo "subjectGCA_017717955.1_PDT000990484.1_genomic_querry.fasta = GCA_017717955.1_PDT000990484.1_genomic.fasta" >> test2

echo "file=subjectGCA_017717955.1_PDT000990484.1_genomic_querry" >> test2

in the above file I would like to remove the dots ONLY between the strings 'subject' and '_querry' but not in the rest of the file.

Therefore the output should look like this:

refH.fasta = ref/GCA_013924565.1_ASM1392456v1_genomic.fasta
subjectGCA_0177179551_PDT0009904841_genomic_querry.fasta = GCA_017717955.1_PDT000990484.1_genomic.fasta
file=subjectGCA_0177179551_PDT0009904841_genomic_querry

thanks


Solution

  • Here is a Ruby to do that:

    ruby -lpe '$_=$_.split(/(subject.*?_query)/).
        map{|s| s=s[/subject.*?_query/] ? s.gsub(/\./,"") : s}.join' test2
    

    Or Perl:

    perl -lnE '@a=(); for $x (split /(subject.*?_query)/){ 
        $x=~s/\.//g if $x=~/subject.*?_query/; 
        push @a,$x } 
        say join("",@a)' test2
    

    It is possible entirely in Bash:

    while IFS= read -r line || [[ -n $line ]]; do 
        if [[ $line =~ (subject.*_query) ]]; then 
            line=${line/""${BASH_REMATCH[1]}""/""${BASH_REMATCH[1]//./}""}
        fi     
        printf "%s\n" "$line"
    done <test2
    

    But this only handles one match per line.