Search code examples
cstringcode-formattingutilityreformatting

Removing strings from C source code


Can anyone point me to a program that strips off strings from C source code? Example

#include <stdio.h>
static const char *place = "world";
char * multiline_str = "one \
two \
three\n";
int main(int argc, char *argv[])
{
        printf("Hello %s\n", place);
        printf("The previous line says \"Hello %s\"\n", place);
        return 0;
}

becomes

#include <stdio.h>
static const char *place = ;
char * multiline_str = ;
int main(int argc, char *argv[])
{
        printf(, place);
        printf(, place);
        return 0;
}

What I am looking for is a program very much like stripcmt only that I want to strip strings and not comments.

The reason that I am looking for an already developed program and not just some handy regular expression is because when you start considering all corner cases (quotes within strings, multi-line strings etc) things typically start to be (much) more complex than it first appears. And there are limits on what REs can achieve, I suspect it is not possible for this task. If you do think you have an extremely robust regular expression feel free to submit, but please no naive sed 's/"[^"]*"//g' like suggestions.

(No need for special handling of (possibly un-ended) strings within comments, those will be removed first)

Support for multi-line strings with embedded newlines is not important (not legal C), but strings spanning multiple lines ending with \ at the end must be supported.

This is almost the same as the some other questions, but I found no reference to any tools.


Solution

  • You can download the source code to StripCmt (.tar.gz - 5kB). It's trivially small, and shouldn't be too difficult to adapt to striping strings instead (it's released under the GPL).

    You might also want to investigate the official lexical language rules for C strings. I found this very quickly, but it might not be definitive. It defines a string as:

    stringcon ::= "{ch}", where ch denotes any printable ASCII character (as specified by isprint()) other than " (double quotes) and the newline character.