I have the task of (trying to) do a search and replace within a large codebase for a word suffix, only when it occurs within comments. All of the comments are of the /* or // type but they are guaranteed to include most of the edge cases imaginable.
So I want to change this:
/* blah blah something__suffix blah */
to this:
/* blah blah something blah */
but I also want to change this:
// blah blah something__suffix blah
to this:
// blah blah something blah
And this:
/*
* blah blah something__suffix blah
*/
to this:
/*
* blah blah something blah
*/
And this:
/**
// blah blah something__suffix blah
*/
To this:
/**
// blah blah something blah
*/
ad nauseam (literally).
Initially I felt that this was a parser task and I installed cochinelle, and indeed it could parse my comments but it got stuck with my preprocessor macros and the workarounds seemed complex for someone who just has this as a one-off task. So now I'm considering regex.
I haven't found a lot of advice around about doing really robust search and replace within C and C++ comments with regex (besides "you need a parser"), but I did notice that there seems to be a pretty well road-tested perl script on the perl FAQ for removing comments in both of these styles here.
as follows:
$/ = undef;
$_ = <>;
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;
print;
My question: how to adapt this script so that instead of stripping the comment, the text that has been identified as a comment can then be searched for the suffix and the suffix removed, leaving the rest of the comment intact?
You need to do it in two steps because you might have
/* foo__suffix bar__suffix */
First, extract the comment, then substitute any __suffix
in the comment.
s{
\G
(?:(?!/[*/]).)*
\K
( /[*] (?:(?![*]/).)* [*]/
| // [^\n]*
)
}{
my $comment = $1;
$comment =~ s/(?<=\w)__suffix//g;
$comment
}xes;
Notes:
(?:(?!STRING).)
is to (?:STRING)
as [^CHAR]
is to CHAR
.
My solution will mess up if you have //
or /*
in a string literal.
If you're ok with removing instances of __suffix
that aren't preceded by an identifier, you can remove the (?<=\w)
.
If you're using 5.14 or higher, you can simplify
s{...}{
my $comment = $1;
$comment =~ s/(?<=\w)__suffix//g;
$comment
}xes;
to
s{...}{
$1 =~ s/(?<=\w)__suffix//rg
}xes;