Search code examples
pythoncmacrosc-preprocessorpyparsing

python parser find where macros are defined


I have a test header file named header2.h that only has

#define ford 15

then another header called header1.h which defines

#include "myheader2.h"
#define make ford
#define car_age 10

I;m trying to automate a Python script using the Pyparsing library so I can parse C header files. Let's say I want to verify that the define car_age exists then I would parse header1.h and check print car_age does exist using the Python script. Let's say I want to verify that the #define make ford exists then I would have to parse header.2 to make sure "ford" exists first.

My python scripts works but my problem is that I have several header files that use definitions from other header files so the process gets very cumbersome.

I don't think the pyparsing library has a feature to help my problem. I was wondering if there is another Python parsing library or a different tool/software that has a feature that can verify macros are defined maybe in the same header file or in another c header file like in my example? Thanks


Solution

  • The key to this design is to use parse actions to maintain the translation table for all the macros; and dynamically update the pyparsing Forward expression that looks for them and substitute them when they are referenced.

    Starting with the macroExpander.py example, we add another expression to detect #include's, a parse action to process them, and a stack to guard against cyclic includes.

    macroInclude = "#include" + pp.quoted_string("include_file_reference").add_parse_action(pp.remove_quotes)
    
    # global stack for include processing (to guard against cyclic includes)
    include_stack = []
    
    def process_include(s, l, t):
        filename = t.include_file_reference
        if filename in include_stack:
            raise ValueError(f"cyclic reference to {filename!r}")
    
        # print(f"processing file {filename!r}")
        include_stack.append(filename)
    
        resolved_file = resolve_file(filename)
        if resolved_file is not None:
            # searching for matches will update the macros dict
            macroExpander.search_string(resolved_file.read_text())
    
        # all done with this include file, pop from the stack
        include_stack.pop()
        return " ".join(t)
    
    macroInclude.add_parse_action(process_include)
    

    Add macroInclude to the macroExpander expression:

    # define pattern for scanning through the input string
    macroExpander = macroExpr | macroDef | macroInclude
    

    I also added this line so we can comment out sample code and the parser will be smart enough to skip over them:

    # ignore comments
    macroExpander.ignore(pp.c_style_comment)
    

    Here is some test code to create your sample files in a temp directory created using the tempfile module of the stdlib:

    from tempfile import TemporaryDirectory
    from pathlib import Path
    from textwrap import dedent
    
    # header1.h contents
    file1 = dedent("""\
    /* change to header1.h to see handling of cyclic include */
    #include "header2.h"
    #define make ford
    #define car_age 10
    """)
    
    # header2.h contents
    file2 = dedent("""\
    #define ford 15
    """)
    
    # program.c contents
    file3 = dedent("""\
    #include "header1.h"
    #include <stdio.h>
    
    printf("My car is a ", make, " it is ", car_age, " years old!\\n");
    """)
    
    # create a temporary dir for header files
    with TemporaryDirectory() as tmpdir_str:
        tmpdir = Path(tmpdir_str)
    
        def resolve_file(fname):
            ret = tmpdir / fname
            if ret.exists():
                return ret
            else:
                return None
    
        (tmpdir / "header1.h").write_text(file1)
        (tmpdir / "header2.h").write_text(file2)
        (tmpdir / "program.c").write_text(file3)
    
        expanded = macroExpander.transform_string((tmpdir / "program.c").read_text())
        print(expanded)
        print(macros)
    

    Running this, I get the following:

    #include header1.h
    #include <stdio.h>
    
    printf("My car is a ", 15, " it is ", 10, " years old!\n");
    
    {'ford': '15', 'make': '15', 'car_age': '10'}