I'm trying to replace/match with a regular expression based on some simple grammatical concepts. It's late, and I approached the regular expression website with the usual confidence that I could get the thing learnt in an evening. I do this about once every six months. Yes, I'm foolish.
Anyway, just in case there are any takers up at this hour (or indeed across the pond) could someone please give me a regular expression that upholds some simple grammatical rules:
Some general explanation would definitely warrant an upvote, as I'm sure this will help me in my quest for regex understanding.
Sorry to dampen the mood but I'm using Actionscript 3 to implement this. Not sure which regex engine it leverages but no doubt it'll have a few quirks to it. It's worth a shot in any regex implementation you're used to, though.
Here's a visual:
// string before
var string:String = '" Hello ,my name is Shennan ,, "he said . ';
string = string.replace(/* your regex magic */, /* replace with */);
trace(string); /* output: "Hello, my name is Shennan," he said. */
This handles the spaces before and after commas and periods:
var pattern:RegExp = / *([,.]) */g;
string.replace(pattern, "$1 ");
This handles the spaces before single quotes:
var pattern:RegExp = / *'/g;
string.replace(pattern, "'");
This handles repetitious commas, periods, and single quotes:
var pattern:RegExp = /([,.'])\1*/g;
string.replace(pattern, "$1");
There is no easy way to handle paired quotes, because, for example, quoted material (e.g. speech) that gets broken up into paragraphs often re-open quotes without closing quotes in the previous paragraphs. If and only if quotes are guaranteed to be paired evenly, then you can use:
var pattern:RegExp = /" *([^"]*)"/g;
string.replace(pattern, '"$1"');
var pattern:RegExp = /("[^"]*")(?![. ])/g;
string.replace(pattern, '$1 ');
Actionscript 3 supports backreferences as well as negative lookaheads, so all of the above should work, but admittedly I have not tested them (yet, as I need to run out).