Search code examples
regexactionscript-3grep

Regex for basic grammar formatting


I'm trying to replace/match with a regular expression based on some simple grammatical concepts. It's late, and I approached the regular expression website with the usual confidence that I could get the thing learnt in an evening. I do this about once every six months. Yes, I'm foolish.

Anyway, just in case there are any takers up at this hour (or indeed across the pond) could someone please give me a regular expression that upholds some simple grammatical rules:

  • commas (,) periods (.) and single quotes (') are never preceded by 1 or more spaces.
  • commas (,) periods (.) are always followed by one space (no more, no less).
  • commas (,) periods (.) and spaces ( ) are never repeated more than once.
  • the first double quote (") in a pair is never followed by one or more spaces, and the last is always followed by one space or a period (.) character.
  • the last double quote should not have any spaces ( ) before it.

Some general explanation would definitely warrant an upvote, as I'm sure this will help me in my quest for regex understanding.

Sorry to dampen the mood but I'm using Actionscript 3 to implement this. Not sure which regex engine it leverages but no doubt it'll have a few quirks to it. It's worth a shot in any regex implementation you're used to, though.

Here's a visual:

// string before

var string:String = '" Hello ,my name is Shennan ,, "he said  .  ';

string = string.replace(/* your regex magic */, /* replace with */);

trace(string); /* output: "Hello, my name is Shennan," he said. */

Solution

  • This handles the spaces before and after commas and periods:

    var pattern:RegExp = / *([,.]) */g;
    string.replace(pattern, "$1 ");
    

    This handles the spaces before single quotes:

    var pattern:RegExp = / *'/g;
    string.replace(pattern, "'");
    

    This handles repetitious commas, periods, and single quotes:

    var pattern:RegExp = /([,.'])\1*/g;
    string.replace(pattern, "$1");
    

    There is no easy way to handle paired quotes, because, for example, quoted material (e.g. speech) that gets broken up into paragraphs often re-open quotes without closing quotes in the previous paragraphs. If and only if quotes are guaranteed to be paired evenly, then you can use:

    var pattern:RegExp = /" *([^"]*)"/g;
    string.replace(pattern, '"$1"');
    

     

    var pattern:RegExp = /("[^"]*")(?![. ])/g;
    string.replace(pattern, '$1 ');
    

    Actionscript 3 supports backreferences as well as negative lookaheads, so all of the above should work, but admittedly I have not tested them (yet, as I need to run out).