Search code examples
regexsed

Remove commas between patterns


I am trying to replace all commas with a space between ";" and "[" in my text file using sed. An example line is: K01810,GPI,,pgi;,glucose-6-phosphate,isomerase,[EC:5.3.1.9] where my ideal output is K01810,GPI,,pgi; glucose-6-phosphate isomerase [EC:5.3.1.9].

I have tried sed 's/;\(.*\)\[/;\1[/g' myfile.txt > newfile but that didn't change anything.

myfile.txt:

09100,Metabolism
09101,Carbohydrate,metabolism
00010,Glycolysis,/,Gluconeogenesis,[PATH:ko00010]
K00844,HK;,hexokinase,[EC:2.7.1.1]
K12407,GCK;,glucokinase,[EC:2.7.1.2]
K00845,glk;,glucokinase,[EC:2.7.1.2]
K25026,glk;,glucokinase,[EC:2.7.1.2]
K01810,GPI,,pgi;,glucose-6-phosphate,isomerase,[EC:5.3.1.9]
K06859,pgi1;,glucose-6-phosphate,isomerase,,archaeal,[EC:5.3.1.9]
K13810,tal-pgi;,transaldolase,/,glucose-6-phosphate,isomerase,[EC:2.2.1.2,5.3.1.9]
K15916,pgi-pmi;,glucose/mannose-6-phosphate,isomerase,[EC:5.3.1.9,5.3.1.8]
K24182,PFK9;,6-phosphofructokinase,[EC:2.7.1.11]
K00850,pfkA,,PFK;,6-phosphofructokinase,1,[EC:2.7.1.11]
K16370,pfkB;,6-phosphofructokinase,2,[EC:2.7.1.11]

Solution

  • sed '
        # skip lines that do not match
        /^\(.*\)\(;.*\[\)\(.*\)$/ ! b
    
        # save a copy of original line
        h
    
        # extract prefix/suffix
        s//\1\n\3/
    
        # save them, restore original line
        x
    
        # extract part to change
        s//\2/
    
        # do the substitution
        y/,/ /
    
        # append prefix/suffix
        G
    
        # rearrange
        s/^\([^\n]*\)\n\([^\n]*\)\n/\2\1/
        
    ' myfile.txt >newfile
    

    Note: For portability, escaped literal newline is required in replacement (some versions of sed treat \n as just n):

        # ...
    
        # extract prefix/suffix
        s//\1\
    \3/
    
        # ...