Listing All Punctuation In A String Using ColdFusion reFindNoCase

I'm unable to output a list of all punctuation using reFindNoCase by calling it with the POSIX character class [:punct:] or the actual list of characters it represents, escaped of course shown here: http://www.regular-expressions.info/posixbrackets.html

I expect reFindNoCase to provide a list of positions and locations in the string for each and every presented character in the following example, but that's not the case:

    var strRegexString = "!""##$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~";
    var reObjMatchPunctuation = reFindNoCase("([!""##$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])", LOCAL.strRegexString, 1, True);
    LOCAL.reObjMatchPunctuation._match = [];
    for (var i=1; i <= arrayLen(LOCAL.reObjMatchPunctuation.pos); i++){
        if (LOCAL.reObjMatchPunctuation.pos[i] == 0){
            arrayAppend(LOCAL.reObjMatchPunctuation._match, "NO MATCH");
        }else if (LOCAL.reObjMatchPunctuation.len[i] == 0){
            arrayAppend(LOCAL.reObjMatchPunctuation._match, "ZERO-LENGTH MATCH");
        }else{
            arrayAppend(LOCAL.reObjMatchPunctuation._match, mid(LOCAL.strRegexString, LOCAL.reObjMatchPunctuation.pos[i], LOCAL.reObjMatchPunctuation.len[i]));
        }
    }
    writeDump(LOCAL.reObjMatchPunctuation);

Changing the regex from the above to this also achieves the same results below:

var reObjMatchPunctuation = reFindNoCase("([[:punct:]])", LOCAL.strRegexString, 1, True);

The result of the above snippet is below which should have 36 matches, but not the case:

Dump of LOCAL.reObjMatchPunctuation

Using cfRegex from cfregex.net with the following snippet results in what I was expecting to achieve with reFindNoCase:

    var strRegexString = "!""##$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]";
    var reObjMatchPunctuation = new cfc.cfregex.regex("(\p{Punct})");
    WriteDump(LOCAL.reObjMatchPunctuation.find(LOCAL.strRegexString,1,0,"info"));

Which outputs:

^{_{(long image)}} Output of dump from LOCAL.reObjMatchPunctuation

Which is correct. Is there something wrong with my syntax? Is there a known bug with reFind and/or reFindNoCase that this is covered by?

Note: I used a portion of code snippet from Adam Cameron's blog post to make viewing the results simplified: http://blog.adamcameron.me/2013/01/regular-expressions-in-coldfusion-part_10.html

Solution

I just had a look at your code more closely. Your assertion is incorrect. find functions only find the first match of the string / regex in the target string. So your find operation does exactly that: finds the first match in that regex, which is the first character in the string. Your returned array has two elements because you have a subexpression capture in your regex (the ()), so the first element in the array is the entire match; the second is the subexpression. In this case, the same single character.

To achieve what you want to achieve you need to put your reFind() (it doesn't need to be reFindNoCase() as you're not looking for alphabetic characters) into a loop, and incrementally advance the starting point of the find operation (ie: the third argument).

Or use reMatch(), eg:

stringToInspect = "!""##$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~";

regex = "[[:punct:]]";
matches = reMatch(regex, stringToInspect);
writeDump(var=matches, label="Using #regex#");

regex = "[#stringToInspect#]";
matches = reMatch(regex, stringToInspect);
writeDump(var=matches, label="Using #regex#");

Note that both examples here have 35 matches... your original test string had ] duplicated (in the middle just after \\\, and again at the end).