SHORT QUESTION
Let's have a regex, which reads a string inside a double quotes. This string is valid only if it has NO double quotes inside.
("([^"]+)")
How would one write a regex, which would have the same functionality but will also work for a string with a double quotes WITH a preceding slash?
"Valid string" //VALID
"Valid \"string\"" //VALID
"Invalid " + "string" //INVALID
"Invalid " + "\"string\"" //INVALID
LONG QUESTION
I'm building my own gettext implementation - I found out that the official gettext apps ( http://www.gnu.org/s/gettext/ ) are not sufficient to my needs.
That means I need to find all strings inside each C# code file myself, but only those which are passed to a particular function as the only parameter.
I built a regex which gets most of the strings. The function Translate is public, static and is situated in the namespace GetTextLocalization and in the class Localization.
(GetTextLocalization\.)?(Localization\.Translate)\("([^"]+)"\)
Of course, this will ONLY find the strings alone and it won't find any strings with a verbatim character. If a string parameter is being passed as an operation ("string a" + "string b") or starts with a verbatim (@"Verbatim string"), it will not parse, but that is not the problem.
The regex definition:
([^"]+)
says that there must be no double quotes inside the string and I know that noone in the company is connecting the string somehow while passing it in the parameter. Still, I need to have this construction as a safety "what if" measure.
But that also causes the problem. The double quotes actually can be there.
Localization.Translate("Perfectly valid String with \"double quotes\"")
I need to change the regex so it will include the strings with a double quote (so I skip anything like Translate("a" + "b") which would mess with the translation catalog) but only those which are preceded by a slash .
I thought I might need to use this (?!) grouping construct somehow but I have no idea where to place it.
Since you probably want to allow doubled backslashes before a quote, I suggest
"(?:\\.|[^"\\])*"
Explanation:
" # Match "
(?: # Either match
\\. # an escaped character
| # or
[^"\\] # any character except " or \
)* # any number of times.
" # Match "
This matches "hello"
, "hello\"there"
or "hello\\"
but fails on "hello" there"
or "hello\\" there"
.