I have built a parser in Sprache and C# for files using a format I don't control. Using it I can correctly convert:
a = "my string";
into
my string
The parser (for the quoted text only) currently looks like this:
public static readonly Parser<string> QuotedText =
from open in Parse.Char('"').Token()
from content in Parse.CharExcept('"').Many().Text().Token()
from close in Parse.Char('"').Token()
select content;
However the format I'm working with escapes quotation marks using "double doubles" quotes, e.g.:
a = "a ""string"".";
When attempting to parse this nothing is returned. It should return:
a ""string"".
Additionally
a = "";
should be parsed into a string.Empty
or similar.
I've tried regexes unsuccessfully based on answers like this doing things like "(?:[^;])*"
, or:
public static readonly Parser<string> QuotedText =
from content in Parse.Regex("""(?:[^;])*""").Token()
This doesn't work (i.e. no matches are returned in the above cases). I think my beginners regex skills are getting in the way. Does anybody have any hints?
EDIT: I was testing it here - http://regex101.com/r/eJ9aH1
If I'm understanding you correctly, this is the kind of regex you're looking for:
"(?:""|[^"])*"
See the demo.
1. "
matches an opening quote
2. (?:""|[^"])*
matches two quotes or any chars that are not a quote (including newlines), repeating
3. "
matches the closing quote.
But it's always going to boil down to whether your input is balanced. If not, you'll be getting false positives. And if you have a string such as "string"", which should be matched?
"string"",
""`, or nothing?... That's a tough decision, one that, fortunately, you don't have to make if you are sure of your input.