Search code examples
regexbashsednon-greedy

regex in sed removing only the first occurrence from every line


I have the following file I would like to clean up

cat file.txt

MNS:N+    GYPA*01 or GYPA*M   
MNS:M+    GYPA*02 or GYPA*N
MNS:Mc    GYPA*08 or GYP*Mc
MNS:Vw    GYPA*09 or GYPA*Vw
MNS:Mg    GYPA*11 or GYPA*Mg
MNS:Vr    GYPA*12 or GYPA*Vr

My desired output is:

MNS:N+  GYPA*01 or GYPA*M   
MNS:M+  GYPA*02 or GYPA*N
MNS:Mc  GYPA*08 or GYP*Mc
MNS:Vw  GYPA*09 or GYPA*Vw
MNS:Mg  GYPA*11 or GYPA*Mg
MNS:Vr  GYPA*12 or GYPA*Vr

I would like to remove everything between ":" and the first occurence of "or"

I tried sed 's/MNS:d*?or /MNS:/g' though it removes the second "or" as well.

I tried every option in https://www.geeksforgeeks.org/sed-command-in-linux-unix-with-examples/

to no avail. should I create alias sed='perl -pe'? It seems that sed does not properly support regex


Solution

  • perl should be more suitable here because we need Lazy match logic here.

    perl -pe 's|(:.*?or +)(.*)|:\2|' Input_file
    

    by using .*?or we are checking for the first nearest match for or string in the line.