testing compiler-construction interpreter brainfuck

How would one go about testing an interpreter or a compiler?

I've been experimenting with creating an interpreter for Brainfuck, and while quite simple to make and get up and running, part of me wants to be able to run tests against it. I can't seem to fathom how many tests one might have to write to test all the possible instruction combinations to ensure that the implementation is proper.

Obviously, with Brainfuck, the instruction set is small, but I can't help but think that as more instructions are added, your test code would grow exponentially. More so than your typical tests at any rate.

Now, I'm about as newbie as you can get in terms of writing compilers and interpreters, so my assumptions could very well be way off base.

Basically, where do you even begin with testing on something like this?

Solution

Testing a compiler is a little different from testing some other kinds of apps, because it's OK for the compiler to produce different assembly-code versions of a program as long as they all do the right thing. However, if you're just testing an interpreter, it's pretty much the same as any other text-based application. Here is a Unix-centric view:

You will want to build up a regression test suite. Each test should have
- Source code you will interpret, say test001.bf
- Standard input to the program you will interpret, say test001.0
- What you expect the interpreter to produce on standard output, say test001.1
- What you expect the interpreter to produce on standard error, say test001.2 (you care about standard error because you want to test your interpreter's error messages)

You will need a "run test" script that does something like the following

function fail {
  echo "Unexpected differences on $1:"
  diff $2 $3
  exit 1
}

for testname
do
  tmp1=$(tempfile)
  tmp2=$(tempfile)
  brainfuck $testname.bf < $testname.0 > $tmp1 2> $tmp2
  [ cmp -s $testname.1 $tmp1 ] || fail "stdout" $testname.1 $tmp1
  [ cmp -s $testname.2 $tmp2 ] || fail "stderr" $testname.2 $tmp2
done

You will find it helpful to have a "create test" script that does something like
```
brainfuck $testname.bf < $testname.0 > $testname.1 2> $testname.2
```
You run this only when you're totally confident that the interpreter works for that case.
You keep your test suite under source control.
It's convenient to embellish your test script so you can leave out files that are expected to be empty.
Any time anything changes, you re-run all the tests. You probably also re-run them all nightly via a cron job.
Finally, you want to add enough tests to get good test coverage of your compiler's source code. The quality of coverage tools varies widely, but GNU Gcov is an adequate coverage tool.

Good luck with your interpreter! If you want to see a lovingly crafted but not very well documented testing infrastructure, go look at the test2 directory for the Quick C-- compiler.