unit-testing testing compiler-construction antlr catch2

What negative test cases are commonly used for compiler scanners/parsers?

I am working on a compiler for a language presented in a class. Against my better judgement, I got ahead of myself and never wrote tests for any compiler tests. My professor wants to see tests.

The language is C-like. I am using Catch2 and CTest to write tests. Here is an example of a positive scanner test:

TEST_CASE("Scanner keyword tests", "[front-end]") {
  antlr4::ANTLRInputStream input("int boolean str var func proc if then else while do select return length extern");
  WPLLexer lexer(&input);

  CHECK(lexer.nextToken()->getType() == lexer.INT);
  CHECK(lexer.nextToken()->getType() == lexer.BOOLEAN);
  CHECK(lexer.nextToken()->getType() == lexer.STR);
  CHECK(lexer.nextToken()->getType() == lexer.VAR);
  CHECK(lexer.nextToken()->getType() == lexer.FUNC);
  CHECK(lexer.nextToken()->getType() == lexer.PROC);
  CHECK(lexer.nextToken()->getType() == lexer.IF);
  CHECK(lexer.nextToken()->getType() == lexer.THEN);
  CHECK(lexer.nextToken()->getType() == lexer.ELSE);
  CHECK(lexer.nextToken()->getType() == lexer.WHILE);
  CHECK(lexer.nextToken()->getType() == lexer.DO);
  CHECK(lexer.nextToken()->getType() == lexer.SELECT);
  CHECK(lexer.nextToken()->getType() == lexer.RETURN);
  CHECK(lexer.nextToken()->getType() == lexer.LENGTH);
  CHECK(lexer.nextToken()->getType() == lexer.EXTERN);
}

I want to write some negative test cases, but have no idea where to start. What are some common negative test cases for compiler scanners and parsers?

Solution

There's no "common test case", either positive or negative, for compilers in general. Every language is different. There could be a corpus of test cases for a specific language; your best bet is to look in the tests directory of an open-source implementation for that language, if it exists. So this is basically a non-answer to your question, but I hope that it provides some useful information.

For lexers, you should attempt to fully exercise the lexical grammar. Many lexical grammars have few or no actual error conditions; any input can be decomposed into tokens of some sort. (A typical fallback is to emit any single character which cannot match any pattern as a token, which will then be rejected by the parser.) But there are a few exceptions to the above rule:

Character encoding errors (the bytes in the input cannot be decoded into valid code points);
Unterminated quoted literals and comments.

You should most certainly verify that your lexer behaves appropriately for such cases. You should also verify that your lexer doesn't choke on excessively long tokens, or weird escape sequences in quoted literals. One issue which is frequently mishandled by lexers is an end-of-input not preceded by a newline character. In an erroneous input, premature end-of-file could appear anywhere (for example, in the middle of a character string escape sequence); make sure your tests include these cases as well.

For parsers, you'll certainly want to ensure that every grammatical rule is tested by at least one input. There have been attempts to define algorithms for generating test sets with complete coverage; an internet search for "grammar test case generation" will produce a number of leads. All of them have some flaws, but the tools can help a lot.

Syntax errors are harder to test for. You can use a fuzzer --in fact, you probably should use a fuzzer-- but randomly perturbing a syntactically correct input might produce a different syntactically correct input. It's definitely verging into "art, not science".

If you make some attempt to produce specific, more useful syntax errors (which is probably not part of your assignment), then you should test each plausible correctable error case. Again, you'll most likely need to construct that test set by hand, taking into account the goals you had for producing error messages.

If you have time, you should also attempt to find inputs, valid or not, which push your parser into quadratic or worse parse times. (For production parsers, this test is necessary, but it might be outside of the scope of your compilers course.) Such issues could result from uncontrolled backtracking in a parser which backtracks (Antlr, for example). They can also be the result of apparently innocuous inefficiencies in hash-table implementations and other internal structures, which might be triggered by malicious inputs.