Search code examples
javascriptregexregex-lookaroundsregex-groupregex-greedy

Regex optimization and best practice


I need to parse information out from a legacy interface. We do not have the ability to update the legacy message. I'm not very proficient at regular expressions, but I managed to write one that does what I want it to do. I just need peer-review and feedback to make sure it's clean.

The message from the legacy system returns values resembling the example below.

%name0=value
%name1=value
%name2=value
Expression: /\%(.*)\=(.*)/g;
var strBody = body_text.toString();
var myRegexp = /\%(.*)\=(.*)/g;
var match = myRegexp.exec(strBody);
var objPair = {};

while (match != null) {
    if (match[1]) {
        objPair[match[1].toLowerCase()] = match[2];
    }
    match = myRegexp.exec(strBody);
}

This code works, and I can add partial matches the middle of the name/values without anything breaking. I have to assume that any combination of characters could appear in the "values" match. Meaning it could have equal and percent signs within the message.

  • Is this clean enough?
  • Is there something that could break the expression?

Solution

  • First of all, don't escape characters that don't need escaping: %(.*)=(.*)

    The problem with your expression: An equals sign in the value would break your parser. %name0=val=ue would result in name0=val=ue instead of name0=val=ue.

    One possible fix is to make the first repetition lazy by appending a question mark: %(.*?)=(.*)

    But this is not optimal due to unneeded backtracking. You can do better by using a negated character class: %([^=]*)=(.*)

    And finally, if empty names should not be allowed, replace the first asterisk with a plus: %([^=]+)=(.*)

    This is a good resource: Regex Tutorial - Repetition with Star and Plus