I've been exploring writing a very basic / limited interpreter in javascript as an exercise. All has been going well until I introduced the concept of LOOPs.
Given the following script:
LOOP 2
A
LOOP 3
B
END
LOOP 4
C
LOOP 5
D
END
E
END
F
END
The algorithm should visit the inner tokens in the following sequence:
ABBBCDDDDDECDDDDDECDDDDDECDDDDDEFABBBCDDDDDECDDDDDECDDDDDECDDDDDEF
The following does the trick, but it requires lots of iterating over the tokens. It's an improvement over a previous slicing approach I used that manually expanded the loops, but is far from optimal.
/**
* In practice, we'll grab each token as we read the script,
* but to keep this simple and focus on the loop algorithm,
* we can cheat and make an array of all the tokens.
*/
const getTokens = (s) => s.replace(/[\W_]+/g, " ").split(" ").filter(Boolean);
/* Temp vars - ideally, I'd like to solve this with arrays. */
const start = []; // Loop start indices
const end = []; // Loop end indices
const counts = []; // Times to loop
const completed = []; // Loops completed
for (let i = 0; i < tokens.length; i++) {
const token = tokens[i];
if (token === "LOOP") {
if (start.length == 0 || i > start[start.length - 1]) {
// Add new loop index if we haven't seen it before
start.push(i); // Store the loop index
counts.push(Number(tokens[i + 1])); // The loop count is always next LOOP token
completed.push(0); // Initialize with 0 completed at index
// Find the end index for the loop
// Note: This is the slowest part.
let skip = 0;
for (let j = i + 2; j < tokens.length; j++) {
if (tokens[j] == "LOOP") {
skip++; // Increase nest depth
} else if (tokens[j] == "END") {
if (skip == 0) {
end.push(j); // Found matching loop close
break;
}
skip--;
}
}
}
i++; // Skip over the loop count
continue;
} else if (token === "END") {
let j;
for (j = 0; j < end.length; j++) {
if (end[j] == i) break; // Found matching end index
}
const isCompleted = completed[j] == counts[j] - 1;
if (!isCompleted) {
i = start[j] + 1;
completed[j]++;
for (let k = j + 1; k < start.length; k++) {
completed[k] = 0; // Reset nested loops in between
}
}
continue;
}
console.log(tokens[i]);
}
https://jsfiddle.net/5wpa8t4n/
What's a better way to accomplish this array-based approach using a single pass through the script, or at worst 2 passes, but not N-LOOP passes?
You don't need to know the position of the matching end
of the loop when starting to interpret it. All you need to record is the position to jump back to when encountering the next end
, but until then just continue interpreting token by token.
These positions, together with the respective counters, can be stored in a stack structure.
const script = `
DO
A
DO
B
LOOP 3
DO
C
DO
D
LOOP 5
E
LOOP 4
F
LOOP 2
`
const parse = (script) =>
script
.replace(/[\W_]+/g, " ")
.split(" ")
.filter(Boolean);
const interpret = (code) => {
let loops = []; // Active loops: iteration count and jump target
let ip = 0; // instruction pointer
let result = "";
while (ip < code.length) {
const instruction = code[ip];
switch (instruction) {
case "DO": {
++ip;
loops.push({count: 0, start: ip});
} break;
case "LOOP": {
const limit = Number(code[++ip]);
const {count, start} = loops.pop();
if (count < limit) {
loops.push({count: count+1, start});
ip = start; // jump back
} else {
++ip;
}
} break;
default: {
++ip;
result += instruction; // a print statement
} break;
}
}
return result;
};
console.log(interpret(parse(script)));
I've simplified the structure a bit to use do
-while
loops, so I'd never have to skip the loop body. In a true byte code, emitted by a parser, the jump targets (both back and forth) would be part of the instructions themselves, and only the count "variables" would need to be stored in the stack. The jump targets never change so you'd need to generate them only once in the parse
function.