Search code examples
linuxstringreplacepipedelimiter

How to replace the occurrences of pipe (|) without replacing the pipe used as a delimiter


On Linux server, I have pipe-delimited files that have pipe (|) characters in some string columns. The files are pipe-delimited but the delimiter pipe is always in double quotes while pipes in the text/string can have any characters, space, etc around them. I would like to replace the non-delimiter pipes within the strings with a dash sign (-)

Here is an example:

col1|col2|col3|col4
"1"|"This is my column 2 |Although there is pipe here, it is not a delimiter pipe."|"And this is my 3rd column '|" with a pipe followed by a double quote"|"|and finally this 4th column starts with a | that is not'|a delimiter

I have tried some SED and AWK commands but could not properly exclude the delimiters ("|") from the replacement in the strings.

I am trying to get an output like this:

col1|col2|col3|col4
"1"|"This is my column 2 -Although there is pipe here, it is not a delimiter pipe."|"And this is my 3rd column '-" with a pipe followed by a double quote"|"-and finally this 4th column starts with a - that is not'-a delimiter

Solution

  • One of the ways I can think of is to replace "|" with some special character (## in this case) and replace all occurances of | with - then replace back ## with "|"

    cat file.txt | sed -e "s/\"|\"/##/g" | tr "|" - | sed -e "s/##/\"|\"/g"
    

    Output -

    col1-col2-col3
    "1"|"This is my column 2 -Although there is pipe here, it is not a delimiter pipe."|"And this is my 2nd column '-" with a pipe followed by a double quote"|"-and finally this 3rd column starts with a - that is not'-a delimiter