Search code examples
ccomments

Trying to write a C program to remove all comments from a C program


I am trying to make a program to remove all comments in a C program. I believe this is a common exercise in most C books. The code is as follows:

 #include <stdio.h>
 #include <conio.h>

 #define ON 1
 #define OFF 0
    
    int main()
    {
        FILE* fs, * ft;
        int c = 1, c_prev = 1, slcomment = OFF, mlcomment = OFF, d_quotes = OFF;
    
        fs = fopen("source.c", "rb");
        ft = fopen("target.c", "wb");
    
        c_prev = fgetc(fs);
    
        while ((c = fgetc(fs)) != EOF)
        {
        if (c == '"' && d_quotes == OFF)
            d_quotes = ON;
    
        if (c == '"' && d_quotes == ON)
            d_quotes = OFF;
    
            if (c_prev == '/')
            {
                if (c == '/' && d_quotes == OFF)
                    slcomment = ON;
    
                else if (c == '*' && d_quotes == OFF)
                    mlcomment = ON;
    
                else
                    fputc(c_prev, ft);
            }
    
            if (c == '\n' && slcomment == ON)
                slcomment = OFF;
    
            if (c_prev == '*' && c == '/' && mlcomment == ON && d_quotes == OFF)
            {
                mlcomment = OFF;
                c = fgetc(fs);
            }
    
            if (c != '/' && mlcomment == OFF && slcomment == OFF && d_quotes == OFF)
                fputc(c, ft);
    
            c_prev = c;
        }
    
        fclose(fs);
        fclose(ft);
        printf("Program, after removal of comments, has been copied in target.c...\n");
    
        getch();
        return 0;
    }

The program seems to work fine except that it cant remove the first '/' when there is a single line comment. I just cant seem to figure out what is going wrong. Please point out the mistake. Thank you all in advance.


Solution

  • If you're willing to forgo the requirement to implement this using C, I recommend using Antlr, the grammar for C, and the Trash toolkit to strip comments in a canned, turnkey manner.

    Step 1 -- Clone the grammar repo

    $ git clone https://github.com/antlr/grammars-v4.git
    Cloning into 'grammars-v4'...
    remote: Enumerating objects: 50591, done.
    remote: Counting objects: 100% (1880/1880), done.
    remote: Compressing objects: 100% (1267/1267), done.
    remote: Total 50591 (delta 675), reused 1605 (delta 501), pack-reused 48711
    Receiving objects: 100% (50591/50591), 47.49 MiB | 22.49 MiB/s, done.
    Resolving deltas: 100% (27096/27096), done.
    Updating files: 100% (9413/9413), done.
    

    Step 2 -- Generate a parser for C

    $ cd grammars-v4/c
    $ trgen -t CSharp
    C:\msys64\home\Kenne\temp\grammars-v4\c
    CSharp  C.g4 success 0.0547772
    Rendering template file from CSharp/Other.csproj to ./Generated-CSharp/Other.csproj
    Rendering template file from CSharp/st.build.ps1 to ./Generated-CSharp/st.build.ps1
    Rendering template file from CSharp/st.build.sh to ./Generated-CSharp/st.build.sh
    Rendering template file from CSharp/st.clean.ps1 to ./Generated-CSharp/st.clean.ps1
    Rendering template file from CSharp/st.clean.sh to ./Generated-CSharp/st.clean.sh
    Rendering template file from CSharp/st.Encodings.cs to ./Generated-CSharp/st.Encodings.cs
    Rendering template file from CSharp/st.ErrorListener.cs to ./Generated-CSharp/st.ErrorListener.cs
    Rendering template file from CSharp/st.makefile to ./Generated-CSharp/st.makefile
    Rendering template file from CSharp/st.perf.sh to ./Generated-CSharp/st.perf.sh
    Rendering template file from CSharp/st.ProfilingCommonTokenStream.cs to ./Generated-CSharp/st.ProfilingCommonTokenStream.cs
    Rendering template file from CSharp/st.run.ps1 to ./Generated-CSharp/st.run.ps1
    Rendering template file from CSharp/st.run.sh to ./Generated-CSharp/st.run.sh
    Rendering template file from CSharp/st.test-cover.sh to ./Generated-CSharp/st.test-cover.sh
    Rendering template file from CSharp/st.Test.cs to ./Generated-CSharp/st.Test.cs
    Rendering template file from CSharp/st.test.ps1 to ./Generated-CSharp/st.test.ps1
    Rendering template file from CSharp/st.test.sh to ./Generated-CSharp/st.test.sh
    Rendering template file from CSharp/Test.csproj.st to ./Generated-CSharp/Test.csproj.st
    Copying source file from C:/msys64/home/Kenne/temp/grammars-v4/c/desc.xml to ./Generated-CSharp/desc.xml
    Copying source file from C:/msys64/home/Kenne/temp/grammars-v4/c/C.g4 to ./Generated-CSharp/C.g4
    $ cd Generated-CSharp/
    $ make
    bash build.sh
      Determining projects to restore...
      Restored C:\msys64\home\Kenne\temp\grammars-v4\c\Generated-CSharp\Test.csproj (in 564 ms).
      Determining projects to restore...
      All projects are up-to-date for restore.
      Test -> C:\msys64\home\Kenne\temp\grammars-v4\c\Generated-CSharp\bin\Debug\net8.0\Test.dll
    
    Build succeeded.
        0 Warning(s)
        0 Error(s)
    
    Time Elapsed 00:00:08.09
    

    Step 3 -- Run tools from command line to strip comments

    $ cat input.c
    // Includes for a bunch of stuff.
    
    #include <stdio.h>
    #include <conio.h>
    
    /* Lot's of #defines.... */
    #define ON 1
    #define OFF 0
    
    /*
     * the main program.
     */
    int main()
    {
            FILE* fs, * ft; // Bunches of files.
            int c = 1, c_prev = 1, slcomment = OFF, mlcomment = OFF, d_quotes = OFF;
    
            fs = fopen("source.c", "rb"); // open a bunch of files.
            ft = fopen("target.c", "wb");
    
            c_prev = fgetc(fs); // Read a char ------ bug
    
            // Read a char, until EOF.
            while ((c = fgetc(fs)) != EOF)
            {
                    if (c == '"' && d_quotes == OFF)
                    {
                            d_quotes = ON;
                            fputc (c, ft);
                            c = fgetc (fs);
                    } // end if
    
                    if (c == '"' && d_quotes == ON)
                    {
                            d_quotes = OFF;
                            fputc (c, ft);
                            c = fgetc (fs);
                    }
    
                    if (c_prev == '/')
                    {
                            if (c == '/' && d_quotes == OFF)
                                    slcomment = ON;
    
                            if (c == '*' && d_quotes == OFF)
                                    mlcomment = ON;
    
                            if (slcomment == OFF && mlcomment == OFF && d_quotes == OFF)
                                    fputc(c_prev, ft);
                    }
    
                    if (c == '\n' && slcomment == ON)
                            slcomment = OFF;
    
                    if (c_prev == '*' && c == '/' && mlcomment == ON && d_quotes == OFF)
                    {
                            mlcomment = OFF;
                            c = fgetc(fs);
                    }
    
                    if (c != '/' && mlcomment == OFF && slcomment == OFF && d_quotes == OFF)
                            fputc (c, ft);
    
                    if (d_quotes == ON)
                            fputc (c, ft);
    
                    c_prev = c;
            } // end-while
    
            fclose(fs); // Close files.
            fclose(ft);
            printf("Program, after removal of comments, has been copied in target.c...\n");
    
            getch();
            return 0;
    }
    $ trparse input.c | trquery delete ' //(@BlockComment | @LineComment)' | trsponge -o xxx -c
    CSharp 0 input.c success 0.0696287
    Writing to xxx/input.c
    $ diff input.c xxx/
    1c1
    < // Includes for a bunch of stuff.
    ---
    >
    6c6
    < /* Lot's of #defines.... */
    ---
    >
    10,12c10
    < /*
    <  * the main program.
    <  */
    ---
    >
    15c13
    <       FILE* fs, * ft; // Bunches of files.
    ---
    >       FILE* fs, * ft;
    18c16
    <       fs = fopen("source.c", "rb"); // open a bunch of files.
    ---
    >       fs = fopen("source.c", "rb");
    21c19
    <       c_prev = fgetc(fs); // Read a char ------ bug
    ---
    >       c_prev = fgetc(fs);
    23c21
    <       // Read a char, until EOF.
    ---
    >
    31c29
    <               } // end if
    ---
    >               }
    68c66
    <       } // end-while
    ---
    >       }
    70c68
    <       fclose(fs); // Close files.
    ---
    >       fclose(fs);
    

    How does this work?

    Antlr is a general parser generator for various programming language environments and OSes, and different grammars. The grammars-v4 repo contains over 350 grammars (NB, many are not completely up-to-date with current state of programming languages, and the grammars may have bugs).

    Trash is a toolkit for Antlr parse tree manipulations, and works with the grammars in the grammars-v4 repo. Tools create parser applications from a grammar, parse input and output parse trees. Additional tools work with an XPath engine to find and manipulate nodes in the parse tree.

    For this example, the parse tree contains nodes for BlockComment and LineComment. These are deleted using the trquery command, identified using the XPath expression //(@BlockComment | @LineComment) (attributes in the parse tree are prefaced with @). The modified parse tree is then outputted to file using trsponge.