Search code examples
c#.netregexgettext

C# file gettext-string regex parser


SHORT QUESTION

Let's have a regex, which reads a string inside a double quotes. This string is valid only if it has NO double quotes inside.

("([^"]+)")

How would one write a regex, which would have the same functionality but will also work for a string with a double quotes WITH a preceding slash?

"Valid string"      //VALID
"Valid \"string\""  //VALID
"Invalid " + "string"  //INVALID
"Invalid " + "\"string\""  //INVALID

LONG QUESTION

I'm building my own gettext implementation - I found out that the official gettext apps ( http://www.gnu.org/s/gettext/ ) are not sufficient to my needs.

That means I need to find all strings inside each C# code file myself, but only those which are passed to a particular function as the only parameter.

I built a regex which gets most of the strings. The function Translate is public, static and is situated in the namespace GetTextLocalization and in the class Localization.

(GetTextLocalization\.)?(Localization\.Translate)\("([^"]+)"\)

Of course, this will ONLY find the strings alone and it won't find any strings with a verbatim character. If a string parameter is being passed as an operation ("string a" + "string b") or starts with a verbatim (@"Verbatim string"), it will not parse, but that is not the problem.

The regex definition:

([^"]+)

says that there must be no double quotes inside the string and I know that noone in the company is connecting the string somehow while passing it in the parameter. Still, I need to have this construction as a safety "what if" measure.

But that also causes the problem. The double quotes actually can be there.

Localization.Translate("Perfectly valid String with \"double quotes\"")

I need to change the regex so it will include the strings with a double quote (so I skip anything like Translate("a" + "b") which would mess with the translation catalog) but only those which are preceded by a slash .

I thought I might need to use this (?!) grouping construct somehow but I have no idea where to place it.


Solution

  • Since you probably want to allow doubled backslashes before a quote, I suggest

    "(?:\\.|[^"\\])*"
    

    Explanation:

    "        # Match "
    (?:      # Either match
     \\.     # an escaped character
    |        # or
     [^"\\]  # any character except " or \
    )*       # any number of times.
    "        # Match "
    

    This matches "hello", "hello\"there" or "hello\\" but fails on "hello" there" or "hello\\" there".