Search code examples
regexbashperlawktr

Regex (or bash), get pipes between quotes (perl)


Update: Please keep in mind is that regex is my only option.

Update 2: Actually, I can use a bash based solution as well.

Trying to replace the pipes(can be more than one) that are between double quotes with commas in perl regex

Example

continuer|"First, Name"|123|12412|10/21/2020|"3|7"||Yes|No|No|

Expected output (3 and 7 are separated by a comma)

continuer|"First, Name"|123|12412|10/21/2020|"3,7"||Yes|No|No|

There may be more digits, it may not be just the two d\|d. It could be "3|7|2" and the correct output has to be "3,7,2" for that one. I've tried the following

cat <filename> | perl -pi -e 's/"\d+\|[\|\d]+/\d+,[\|\d]+/g'

but it just puts the actual string of d+ etc...

I'd really appreciate your help. ty


Solution

  • If it must be a regex here is a simpler one

    perl -wpe's/("[^"]+")/ $1 =~ s{\|}{,}gr /eg' file
    

    Not bullet-proof but it should work for the shown use case.

    Explanation. With /e modifier the replacement side is evaluated as code. There, a regex runs on $1 under /r so that the original ($1) is unchanged; $N are read-only and so we can't change $1 and thus couldn't run a "normal" s/// on it. With this modifier the changed string is returned, or the original if there were no changes. Just as ordered.

    Once it's tested well enough add -i to change the input file "in-place" if wanted.


    I must add, I see no reason that at least this part of the job can't be done using a CSV parser...


    Thanks to ikegami for an improved version

    perl -wpe's/"[^"]+"/ $& =~ tr{|}{,}r /eg' file
    

    It's simpler, with no need to capture, and tr is faster


    Tested with strings like in the question, extended only as far as this

    con|"F, N"|12|10/21|"3|7"||Yes|"2||4|12"|"a|b"|No|""|end|