Search code examples
awksedrefactoringautomated-refactoring

Is there a C/C++-grammar-aware search/replace command line tool for linux?


I ofter use tools like sed or awk for replacing contents of text files. However, I find the work with them very difficult when it comes to replacement of syntax elements in C/C++ source code. For example, getting or adding an argument to the function call.

Let's say that I have the following call to a function, named addSymbol:

addSymbol(Position(441,243),4,7,bigFont,smallFont);

And I want to do the following:

-get the arguments of the Position constructor call (441 and 243)

-get the second argument of the addSymbol function call - it is the '4'

Now, getting the Position arguments in my awk script looks like this:

pos=gensub(/.*Position\(([^,]*),([^,]*)\).*/,"\\1,\\2",$0);

Parsing this with awk or sed, simply by counting parentheses and commas looks difficult, because they / the regular expressions, they work with: 1. They read the file line by line. But anything in the source code may be split on multiple lines, which make it very difficult to parse, however is a perfectly valid C/C++ code

  1. They have no concept of scope and parentheses levels. In this example, I cannot simply count the number of commas, because the first comma is part of a constructor call, related to first function argument and should not be counted The tool will have to keep track of the current parentheses level and to count arguments only towards this level

  2. They have no concept of context. The code may have comments or string literals, that must be ignored, when parsing syntax elements. The tool will have to track the current context and ignore things inside comments or string literals.

  3. Spaces before / after arguments have to be ignored. They make regular expressions more complicated, but they don't matter when parsing source code.

In other words, something like this would be very hard to parse with the tools, that I know:

addSymbol(Position(441,243),"some,string",4/*was 5, before*/,7,bigFont,smallFont);

Is there any tool, designed specially for parsing source code? Something, where I can write a script, I imagine that it could look like this:

functionCall = getFunctionCall("addSymbol");
symCount = functionCall.getArg(1);
firstArg = functionCall.getArg(0);
if (firstArg.name == "Position" && firstArg.argsCount==2) { //we are looking for "Position constrictor call with 2 arguments"
    pos_x=firstArg.getArg(0)
    pos_y=firstArg.getArg(1)
}

//if we want to remove an argument and output the edited string
functionCall.removeArg(0);
functionCall.print();

I dont't look for something complicated, which parses included files, macros or templates, or tracks references to variables. Operating on a single .c/.cpp file is fully sufficient.

Such a tool should not be something new, because, after all, it should work the same way, as the compiler, when parsing the source code during compilation.

I looked for programs like cscope and ctags, but they look way more complicated, require all included files in order to parse the entire project.

Is there any simple tool, like awk or sed, but designed specially for parsing source code elements?


Solution

  • Not a the command line level, but a Program Transformation System (PTS) can do this kind of thing. You give it scripts in the form of rewrite rules.

    Our DMS Software Reengineering Toolkit is a PTS that handles full C++17 and will enable you to write transformation rules. DMS's parsing machinery takes care of all the complications of whitespace, line breaks, formatting, comments, number radix, character sets... lots of things which seem stupid but prevent real work from getting done. We've also put a lot of energy into handling preprocessor conditionals, macros and include files.

    For your specific example, the following script seems appropriate.

     domain Cpp~ISO14882c2017; -- making it clear which dialect to use
    
     rule NuclearEdit(x:exp,y:exp,symcount:exp, arg3:exp, arg4:exp): statement->statement
     =  "addSymbol(Position(\x,\y),\arg2,\arg3,\arg4);"
     -> "addSymbol(\symcount,\arg3,\arg4);" .
    

    That's a lot easier than a procedural script.

    You can tell DMS to parse a C++ source file, apply the rule script, and prettyprint the answer. It will preserve layout, comments, number radix, etc. where the transformations have not been applied.

    This is a pretty small example. DMS has been used on large (millions of lines of C++ code) systems to carry out API refactorings at scale.