Search code examples
regexbashsednon-greedy

Sed replace at second occurrence


I want to remove a pattern with sed, only at second occurence. Here is what I want, remove a pattern but on second occurrence.

What's in the file.csv:

a,Name(null)abc.csv,c,d,Name(null)abc.csv,f
a,Name(null)acb.csv,c,d,Name(null)acb.csv,f
a,Name(null)cba.csv,c,d,Name(null)cba.csv,f

Output wanted:

a,Name(null)abc.csv,c,d,Name,f
a,Name(null)acb.csv,c,d,Name,f
a,Name(null)cba.csv,c,d,Name,f

This is what i tried:

sed -r 's/(\(null)\).*csv//' file.csv

The problem here is that the regex is too greedy, but i cannot make is stop. I also tried this, to skip the first occurrence of "null":

sed -r '0,/null/! s/(\(null)\).*csv//' file.csv

Also tried but the greedy regex is still the problem.

sed -r 's/(\(null)\).*csv//2' file.csv

I've read that ? can make the regex "lazy", but I cannot make it workout.

sed -r 's/(\(null)\).*?csv//' file.csv

Solution

  • The more robust awk solution:

    Extended sample file input.csv:

    12,Name(null)randomstuff.csv,2,3,Name(null)randomstuff.csv, false,Name(null)randomstuff.csv
    12,Name(null)AotherRandomStuff.csv,2,3,Name(null)AotherRandomStuff.csv, false,Name(null)randomstuff.csv
    12,Name(null)alphaNumRandom.csv,2,3,Name(null)alphaNumRandom.csv, false,Name(null)randomstuff.csv
    

    The job:

    awk -F, '{ c=0; for(i=1;i<=NF;i++) if($i~/\(null\)/ && c++==1) sub(/\(null\).*/,"",$i) }1' OFS=',' input.csv
    

    The output:

    12,Name(null)randomstuff.csv,2,3,Name, false,Name(null)randomstuff.csv
    12,Name(null)AotherRandomStuff.csv,2,3,Name, false,Name(null)randomstuff.csv
    12,Name(null)alphaNumRandom.csv,2,3,Name, false,Name(null)randomstuff.csv