Search code examples
compilationcompiler-constructioninterpreter

How does an interpreter translate a for loop?


I understand that interpreter translates your source code into machine code line by line, and stops when it encounters an error.

I am wondering, what does an interpreter do when you give it for loops.

E.g. I have the following (MATLAB) code:

for i = 1:10000

pi*pi

end

Does it really run through and translate the for loop line by line 10000 times?

With compilers is the machine code shorter, consisting only of a set of statements that include a go to control statement that is in effect for 10000 iterations.

I'm sorry if this doesn't make sense, I don't have very good knowledge of the underlying bolts and nuts of programming but I want to quickly understand.


Solution

  • I understand that interpreter translates your source code into machine code line by line, and stops when it encounters an error.

    This is wrong. There are many different types of interpreters, but very few execute code line-by-line and those that do (mostly shells) don't generate machine code at all.

    In general there are four more-or-less common ways that code can be interpreted:

    • Statement-by-statement execution

      This is presumably what you meant by line-by-line, except that usually semicolons can be used as an alternative for line breaks. As I said, this is pretty much only done by shells.

      How this works is that a single statement is parsed at a time. That is the parser reads tokens until the statement is finished. For simple statements that's until a statement terminator is reached, e.g. the end of line or a semicolon. For other statements (such as if-statements, for-loops or while-loops) it's until the corresponding terminator (endif, fi, etc.) is found. Either way the parser returns some kind of representation of the statement (some type of AST usually), which is then executed. No machine code is generated at any point.

      This approach has the unusual property that syntax error at the end of the file won't prevent the beginning of the file from being executed. However everything is still parsed at most once and the bodies of if-statements etc. will still be parsed even if the condition is false (so a syntax error inside an if false will still abort the script).

    • AST-walking interpretation

      Here the whole file is parsed at once and an AST is generated from it. The interpreter then simply walks the AST to execute the program. In principle this is the same as above, except that the entire program is parsed first.

    • Bytecode interpretation

      Again the entire file is parsed at once, but instead of an AST-type structure the parser generates some bytecode format. This bytecode is then executed instruction-by-instruction.

    • JIT-compilation

      This is the only variation that actually generates machine code. There are two variations of this:

      • Generate machine code for all functions before they're called. This might mean translating the whole file as soon as it's loaded or translating each function right before it's called. After the code has been generated, execute it.
      • Start by interpreting the bytecode and then JIT-compile specific functions or code paths individually once they have been executed a number of times. This allows us to make certain optimizations based on usage data that has been collected during interpretation. It also means we don't pay the overhead of compilation on functions that aren't called a lot. Some implementations (specifically some JavaScript engines) also recompile already-JITed code to optimize based on newly gathered usage data.

    So in summary: The overlap between implementations that execute the code line-by-line (or rather statement-by-statement) and implementations that generate machine code should be pretty close to zero. And those implementations that do go statement-by-statement still only parse the code once.