Search code examples
regexmultilinepo

RegEx and multiline PO files


I'm trying to create a regex in Notepad++, just a simple search&replace.

The lines are the following:

msgid ""

" CONSUMPTION_PLAN_ERR|The Finished Good's BOM has been changed since production was added.\n"

" To continue using this Job with the new BOM, please update lots and expiries.\n"

" Previous Production Records will not be updated.\n"

msgstr ""

" The Finished Good's BOM has been changed since production was added.\n"

" To continue using this Job with the new BOM, please update lots and expiries.\n"

" Previous Production Records will not be updated.\n"

I need to change the double quotation marks but only on the msgid part. These double quotation marks should be changed to less than < and greater than > symbols, so the result would be the following:

msgid <>

< CONSUMPTION_PLAN_ERR|The Finished Good's BOM has been changed since production was added.\n>

< To continue using this Job with the new BOM, please update lots and expiries.\n>

< Previous Production Records will not be updated.\n>

msgstr ""

" The Finished Good's BOM has been changed since production was added.\n"

" To continue using this Job with the new BOM, please update lots and expiries.\n"

" Previous Production Records will not be updated.\n"

I would need a regex pattern that matches multiline examples like the above, no matter how many lines need to be changed.

I used this pattern to search:

msgid ""\r\n("(.+?)"\r\n){1,}

And this pattern to replace:

msgid <>\r\n<\2>\r\n

Which works somewhat but not exactly what I wanted. It only copies the last line, but not the two above it. I am doing something wrong but don't know what it is.

Suggestions?


Solution

  • You may use a regex, but it is not simple and it won't be efficient. A more efficient solution is to write a parser, or use a more flexible programming language to use a combination of at least two regexps: one would extract the block, and the second would replace the quotes.

    A single regex solution will look like

    Find What: (?s)(?:\G(?!^(?<=.))|^msgid)(?:(?!^msg(?:id|str))[^"])*?\K"((?:(?!^msg(?:id|str))[^"])*?)"
    Replace With: <$1>

    See the regex demo.

    Details

    • (?s) - same as . matches newline when ON
    • (?:\G(?!^(?<=.))|^msgid) - start of a line (^) and then msgid, or (|) the end of the previous successful match (\G(?!^(?<=.)))
    • (?:(?!^msg(?:id|str))[^"])*?
    • \K - match reset operator, the match buffer gets cleared
    • " - a "
    • ((?:(?!^msg(?:id|str))[^"])*?) - Capturing group 1:
      • (?:(?!^msg(?:id|str))[^"])*? - any 0 or more, but as few as possible, occurrences of any char other than ", that does not start a msgid or msgstr char sequences at the start of a line
    • " - a ".

    enter image description here