Search code examples
javascriptregexbrowserwebinternet-explorer-9

I solved a regexp issue associated with IE9, but I'm not sure how or why it works


Background

I have recently had a problem with a regular expression not working as expected in IE9. I tracked the issue down to a specific block inside the expression, namely [^].

var reg = /((?:abc.[^]*?)?test\s*(?:xyz)?\s*)[^]*?/;

The problem

var str = 'abc 123\nabc 123\nabc 123\ntest xyz';
var reg = /((?:abc.[^]*?)?test\s*(?:xyz)?\s*)[^]*?/;
alert(reg.exec(str));

In other words:

Input:

abc 123
abc 123
abc 123
test xyz

Output

Expected: ["abc 123\nabc 123\nabc 123\ntest xyz","abc 123\nabc 123\nabc 123\ntest xyz"]

Chrome: ["abc 123\nabc 123\nabc 123\ntest xyz","abc 123\nabc 123\nabc 123\ntest xyz"]

IE9: ["test xyz", "test xyz"] // Wrong!!!

Attempted solution

I found that the [^] block is causing the error. By simply switching [^] to [\S\s] I was able to attain the expected output in IE9.

var str = 'abc 123\nabc 123\nabc 123\ntest xyz';
var reg = /((?:abc.[\S\s]*?)?test\s*(?:xyz)?\s*)[\S\s]*?/;
alert(reg.exec(str));

Output

Expected: ["abc 123\nabc 123\nabc 123\ntest xyz","abc 123\nabc 123\nabc 123\ntest xyz"]

Chrome: ["abc 123\nabc 123\nabc 123\ntest xyz","abc 123\nabc 123\nabc 123\ntest xyz"]

IE9: ["abc 123\nabc 123\nabc 123\ntest xyz","abc 123\nabc 123\nabc 123\ntest xyz"]

Question

So what is the essential difference between [^] and [\S\s]? What is the problem here? Am I just dealing with an edge-case in the IE-javascript engine?


Solution

  • There is no difference between [^] and [\s\S]. [^] exists in the Javascript specifications but IE9 doesn't handle it as many other Javascript features.

    It seems that [^] is AFAIK particular to Javascript. I have never seen it in an other regex flavour. In other flavours [^] can be seen either as a syntax error or as an unclosed character class (in this case the closing bracket is not the end of the character class because it is immediately after the ^ and the class will eventually be closed at the next closing bracket if it exists).

    Note that [^] and [] are allowed since the first time regex features were added to the language (ECMA-262, 3rd edition December 1999).

    In ECMA-262 third edition specifications (15.10.2.13), you can read that a negative character class is defined like this:

    CharacterClass :: [^ ClassRanges ]
    

    where ClassRanges can be empty or not.

    This definition is always the same in the 6th edition (June 2015).