I am trying to make a program to remove all comments in a C program. I believe this is a common exercise in most C books. The code is as follows:
#include <stdio.h>
#include <conio.h>
#define ON 1
#define OFF 0
int main()
{
FILE* fs, * ft;
int c = 1, c_prev = 1, slcomment = OFF, mlcomment = OFF, d_quotes = OFF;
fs = fopen("source.c", "rb");
ft = fopen("target.c", "wb");
c_prev = fgetc(fs);
while ((c = fgetc(fs)) != EOF)
{
if (c == '"' && d_quotes == OFF)
d_quotes = ON;
if (c == '"' && d_quotes == ON)
d_quotes = OFF;
if (c_prev == '/')
{
if (c == '/' && d_quotes == OFF)
slcomment = ON;
else if (c == '*' && d_quotes == OFF)
mlcomment = ON;
else
fputc(c_prev, ft);
}
if (c == '\n' && slcomment == ON)
slcomment = OFF;
if (c_prev == '*' && c == '/' && mlcomment == ON && d_quotes == OFF)
{
mlcomment = OFF;
c = fgetc(fs);
}
if (c != '/' && mlcomment == OFF && slcomment == OFF && d_quotes == OFF)
fputc(c, ft);
c_prev = c;
}
fclose(fs);
fclose(ft);
printf("Program, after removal of comments, has been copied in target.c...\n");
getch();
return 0;
}
The program seems to work fine except that it cant remove the first '/' when there is a single line comment. I just cant seem to figure out what is going wrong. Please point out the mistake. Thank you all in advance.
If you're willing to forgo the requirement to implement this using C, I recommend using Antlr, the grammar for C, and the Trash toolkit to strip comments in a canned, turnkey manner.
$ git clone https://github.com/antlr/grammars-v4.git
Cloning into 'grammars-v4'...
remote: Enumerating objects: 50591, done.
remote: Counting objects: 100% (1880/1880), done.
remote: Compressing objects: 100% (1267/1267), done.
remote: Total 50591 (delta 675), reused 1605 (delta 501), pack-reused 48711
Receiving objects: 100% (50591/50591), 47.49 MiB | 22.49 MiB/s, done.
Resolving deltas: 100% (27096/27096), done.
Updating files: 100% (9413/9413), done.
$ cd grammars-v4/c
$ trgen -t CSharp
C:\msys64\home\Kenne\temp\grammars-v4\c
CSharp C.g4 success 0.0547772
Rendering template file from CSharp/Other.csproj to ./Generated-CSharp/Other.csproj
Rendering template file from CSharp/st.build.ps1 to ./Generated-CSharp/st.build.ps1
Rendering template file from CSharp/st.build.sh to ./Generated-CSharp/st.build.sh
Rendering template file from CSharp/st.clean.ps1 to ./Generated-CSharp/st.clean.ps1
Rendering template file from CSharp/st.clean.sh to ./Generated-CSharp/st.clean.sh
Rendering template file from CSharp/st.Encodings.cs to ./Generated-CSharp/st.Encodings.cs
Rendering template file from CSharp/st.ErrorListener.cs to ./Generated-CSharp/st.ErrorListener.cs
Rendering template file from CSharp/st.makefile to ./Generated-CSharp/st.makefile
Rendering template file from CSharp/st.perf.sh to ./Generated-CSharp/st.perf.sh
Rendering template file from CSharp/st.ProfilingCommonTokenStream.cs to ./Generated-CSharp/st.ProfilingCommonTokenStream.cs
Rendering template file from CSharp/st.run.ps1 to ./Generated-CSharp/st.run.ps1
Rendering template file from CSharp/st.run.sh to ./Generated-CSharp/st.run.sh
Rendering template file from CSharp/st.test-cover.sh to ./Generated-CSharp/st.test-cover.sh
Rendering template file from CSharp/st.Test.cs to ./Generated-CSharp/st.Test.cs
Rendering template file from CSharp/st.test.ps1 to ./Generated-CSharp/st.test.ps1
Rendering template file from CSharp/st.test.sh to ./Generated-CSharp/st.test.sh
Rendering template file from CSharp/Test.csproj.st to ./Generated-CSharp/Test.csproj.st
Copying source file from C:/msys64/home/Kenne/temp/grammars-v4/c/desc.xml to ./Generated-CSharp/desc.xml
Copying source file from C:/msys64/home/Kenne/temp/grammars-v4/c/C.g4 to ./Generated-CSharp/C.g4
$ cd Generated-CSharp/
$ make
bash build.sh
Determining projects to restore...
Restored C:\msys64\home\Kenne\temp\grammars-v4\c\Generated-CSharp\Test.csproj (in 564 ms).
Determining projects to restore...
All projects are up-to-date for restore.
Test -> C:\msys64\home\Kenne\temp\grammars-v4\c\Generated-CSharp\bin\Debug\net8.0\Test.dll
Build succeeded.
0 Warning(s)
0 Error(s)
Time Elapsed 00:00:08.09
$ cat input.c
// Includes for a bunch of stuff.
#include <stdio.h>
#include <conio.h>
/* Lot's of #defines.... */
#define ON 1
#define OFF 0
/*
* the main program.
*/
int main()
{
FILE* fs, * ft; // Bunches of files.
int c = 1, c_prev = 1, slcomment = OFF, mlcomment = OFF, d_quotes = OFF;
fs = fopen("source.c", "rb"); // open a bunch of files.
ft = fopen("target.c", "wb");
c_prev = fgetc(fs); // Read a char ------ bug
// Read a char, until EOF.
while ((c = fgetc(fs)) != EOF)
{
if (c == '"' && d_quotes == OFF)
{
d_quotes = ON;
fputc (c, ft);
c = fgetc (fs);
} // end if
if (c == '"' && d_quotes == ON)
{
d_quotes = OFF;
fputc (c, ft);
c = fgetc (fs);
}
if (c_prev == '/')
{
if (c == '/' && d_quotes == OFF)
slcomment = ON;
if (c == '*' && d_quotes == OFF)
mlcomment = ON;
if (slcomment == OFF && mlcomment == OFF && d_quotes == OFF)
fputc(c_prev, ft);
}
if (c == '\n' && slcomment == ON)
slcomment = OFF;
if (c_prev == '*' && c == '/' && mlcomment == ON && d_quotes == OFF)
{
mlcomment = OFF;
c = fgetc(fs);
}
if (c != '/' && mlcomment == OFF && slcomment == OFF && d_quotes == OFF)
fputc (c, ft);
if (d_quotes == ON)
fputc (c, ft);
c_prev = c;
} // end-while
fclose(fs); // Close files.
fclose(ft);
printf("Program, after removal of comments, has been copied in target.c...\n");
getch();
return 0;
}
$ trparse input.c | trquery delete ' //(@BlockComment | @LineComment)' | trsponge -o xxx -c
CSharp 0 input.c success 0.0696287
Writing to xxx/input.c
$ diff input.c xxx/
1c1
< // Includes for a bunch of stuff.
---
>
6c6
< /* Lot's of #defines.... */
---
>
10,12c10
< /*
< * the main program.
< */
---
>
15c13
< FILE* fs, * ft; // Bunches of files.
---
> FILE* fs, * ft;
18c16
< fs = fopen("source.c", "rb"); // open a bunch of files.
---
> fs = fopen("source.c", "rb");
21c19
< c_prev = fgetc(fs); // Read a char ------ bug
---
> c_prev = fgetc(fs);
23c21
< // Read a char, until EOF.
---
>
31c29
< } // end if
---
> }
68c66
< } // end-while
---
> }
70c68
< fclose(fs); // Close files.
---
> fclose(fs);
Antlr is a general parser generator for various programming language environments and OSes, and different grammars. The grammars-v4 repo contains over 350 grammars (NB, many are not completely up-to-date with current state of programming languages, and the grammars may have bugs).
Trash is a toolkit for Antlr parse tree manipulations, and works with the grammars in the grammars-v4 repo. Tools create parser applications from a grammar, parse input and output parse trees. Additional tools work with an XPath engine to find and manipulate nodes in the parse tree.
For this example, the parse tree contains nodes for BlockComment and LineComment. These are deleted using the trquery command, identified using the XPath expression //(@BlockComment | @LineComment)
(attributes in the parse tree are prefaced with @
). The modified parse tree is then outputted to file using trsponge.