Search code examples
regexcompiler-constructionbisonlex

Complete Comments REGEX for LEX


I am working on building a calculator compiler using Lex and Yacc. The idea is based upon the following resource: http://epaperpress.com/lexandyacc/index.html.

For the given input file I need to identify all of the comments:

//.TEST -- JWJ
//.Step final  -- testing all requirements
//.source: test-1m.cal
//.expected output: test-1m_expected.out

/**
 *  This program will use Newton's method to estimate the roots of


 This should be a comment as well, but does not get picked up


 *  f(x) = x^3 - 3*x 
 */
 float xn;
 float xo;
// int num_iterations;
 xo = 3.0;
 xn = 3.0;
 num_iterations = 1;

 /* A do-while loop */
 do {
  print xo;
  xo = xn;
  xn = xo - ( xo * xo * xo - 3.0 * xo  ) / ( 3.0 * xo * xo - 3.0);
  num_iterations = num_iterations + 1;
} while ( num_iterations <= 6 )

print xn; // The root found using Newton's method.
print (xo * xo * xo - 3.0 * xo ); // Print f(xn), which should be 0.

I am using the following regular expressions in my lex file:

"//"[^\n]*|"\/\*".*"\*\/"
"\/\*"([^\n])*  
(.)*"\*\/"  

I do not understand why multiple line comments are not being matched? Could someone please offer some insight?


Solution

  • The . character in flex matches any character EXCEPT a newline (so its the same as [^\n]). As a result, none of your regexes match any comment containing a newline.

    The usual regex for a C-style comment is:

    "/*"([^*]|\*+[^*/])*\*+"/"
    

    This matches 0 or more "anything except *" OR "1 or more *s not followed by a * or /" inside the comment markers.