Search code examples
parsingcompiler-constructiontokeninterpreter

Parsing: Is holding all tokens in a list a bad idea?


In a recursive descent-parser with backtracking is it a bad idea to hold all of the tokens in a list? I read that it is a good approach unless we have an infinite number of tokens, but what if there is a big file and the number of tokens is big. Would that have any big effect on perfomance?


Solution

  • I wouldn't worry too much about it. If the file is so big that storing all tokens at once takes significant amounts of memory, then the abstract syntax tree will also take too much memory. More generally, you'll most likely need to have some representation of the whole file in memory sooner or later. Single-pass compilers aren't even possible for most modern languages, and the languages that go out of their way to enable a single-pass compiler (e.g., C) pay for it with a worse development experience.

    Additionally, assuming we aren't talking about a C/C++ style preprocessor model, reasonable code files are smaller than a megabyte, meaning that even with a very space-inefficient token data type you're realistically looking at no more than a few dozen megabytes. Any larger file is pathological and the problem of the people writing such large files, not your problem.