Search code examples
regexperl

Only find multiline C comment but not single line C comments


Suppose I have this text:

cat file
/* comment */ not a comment /* another comment */

/* delete this  *
/* multiline    *
/* comment      */

/*************
/* and this  *  
/************/
The End

I can use the perl with a conditional ? : to delete only the multiline comment:

perl -0777 -pE 's/(\/\*(?:\*(?!\/)|[^*])*\*\/)/($1=~qr"\R") ? "" : $1/eg;' file

Prints:

/* comment */ not a comment /* another comment */




The End

Without the conditional:

perl -0777 -pE 's/(\/\*(?:\*(?!\/)|[^*])*\*\/)//g;' file
 not a comment 




The End

Is there a way to delete only multiline C style comments with a regex only? ie, not use the perl conditional code in the replacement?


Solution

  • With a SKIP/FAIL approach:

    perl -0777 -pe's~/\*\N*?\*/(*SKIP)^|/\*.*?\*/~~gs' file
    

    demo

    \N matches all that isn't a line-break
    The dot matches all characters including newlines since the s flag is used.

    The first branch matches "inline" comments, and is forced to fail with ^ (shorter than writing (*F) or (*FAIL) but same result). The (*SKIP) backtracking control verb forces to not retry previous positions, so the next attempts starts after the position of the closing */.

    The second branch matches remaining comments that are necessarly multiline.


    A shorter variant, with the same two branches but this time using \K to excludes the consumed characters from the match result:

    perl -0777 -pe's~/\*\N*?\*/\K|/\*.*?\*/~~gs' file
    

    demo

    This time the first branch succeeds, but since all characters before \K are removed from the match result, the remaining empty string is replaced with an empty string.


    These two search/replace aren't very different than doing the more portable:

    s~(/\*.*?\*/)|/\*[\s\S]*?\*/~$1~g
    

    but with less efforts (no capture group needed, empty replacement string).