Search code examples
regexcsvnotepad++

How to replace double quotes from csv inside a tag using Regex


I have a csv file which contains double quotes within a tag and enclosed with double quote. Needs to replace double quote inside a tag with some other character. Eg.

"id"|"Name"|"Note"
"1"|"Sam"|"<Note> This is "a" Sample </Note>"
"2"|"Sam1"|"<Note> This "is "a" Sam"ple "</Note>"

Desired Output

"id"|"Name"|"Note"
"1"|"Sam"|"<Note> This is a Sample </Note>"
"2"|"Sam1"|"<Note> This is a Sample </Note>"

Solution

  • Here is a way to go:

    • Ctrl+H
    • Find what: (?:<Note>|\G(?!^))(?:(?!</Note)[^"])*\K"(?=.*</Note>)
    • Replace with: LEAVE EMPTY
    • CHECK Match case
    • CHECK Wrap around
    • CHECK Regular expression
    • UNCHECK . matches newline
    • Replace all

    Explanation:

    (?:                 # non capture group
        <Note>              # literally, open tag
      |                   # OR
        \G(?!^)             # restart from last match position except beginning of line
    )                   # end group
    (?:                 # non capture group
        (?!                 # negative lookahead, make sure we haven't after:
            </Note              # literally close tag
        )                   # end lookahead
        [^"]                # any character that is not a double quote    
    )*                  # end group, may appear 0 or more times
    \K                  # forget all we have seen until this position
    "                   # double quote
    (?=.*</Note>)       # positive lookahead, make sure we have close tag after
    

    Screenshot (before):

    enter image description here

    Screenshot (after):

    enter image description here