Search code examples
javascriptregexbbcodephpbb3

Javascript Making BBcode regex ungreedy


thanks to another question: removing BBcode from textarea with Javascript
I managed to create this: http://jsfiddle.net/hVgAh/1/

 text = $('textarea').val();
while (text.match(/\[quote.*\[\/quote\]/i) != null) {
   //remove the least inside the innermost found quote tags 
   text = text.replace(/^(.*)\[quote.*?\[\/quote\](.*)$/gmi, '\$1\$2');
}
text = text.replace(/\[\/?[^\[\]]+\]/gmi,'');
// now strip anything non-character
//text = text.replace(/[^a-z0-9]/gmi, '');
  char = text.length;
  $('div').text(text);

this code does remove the quote bbcode ( and other BBcode as well), but it only removes the content of the deepest quote, or last qoute that it will ever see. I think the reason for this is that the regex is greedy. But i tried to make it not greedy by adding ? but i didnt work: http://jsfiddle.net/hVgAh/2/

i need to remove all the quotes with its content. How can i do that?


Solution

  • There is no need to strip the newlines: to match any character including newlines use [\s\S] instead of ..

    The multiline modifier m, which makes the anchor tags ^ and $ match the beginning and end of a line instead of the whole string, is unnecessary also.

    Here is a solution that also avoids the repeated match calls:

    var t;
    while ( t != text ) {
        t = text;
        //text = text.replace( /\[quote(?:(?!\[quote)[\s\S])+?\[\/quote\]/g, '' );
        text = text.replace( /^([\s\S]*)\[quote[\s\S]+?\[\/quote\]/g, '$1');
    }
    

    The commented-out line is an alternative version which should work equally well.

    Instead of a greedy match it uses a negative look-ahead to ensure that only the deepest quote tags are matched. It is enclosed in parentheses with the [\s\S] so that it looks ahead before every character between the quote tags is matched, and prevents a match if [quote appears.

    It is hard to say which would be more efficient.

    See JSFIDDLE.