Search code examples
javascriptnode.jsparsinglex

why do i get a fatal javascript error while generating tokens from lisp code


I understand that some declaration is going over memory limit, but i do not understand what it is

for context this program takes in lisp code as input and converts it to tokens

// regex tests
const LETTERS = /[-_a-zA-Z0-9]/;
const WHITESPACE = /\s/;
const NUMBERS = /\d/;
const RESERVEDWORDS = ['write-line', ';','write',....]; // ... is other reserved words that i did not list out

// lexing
module.exports = function tokenizer(input) {
    const tokenList = [];
    let i = 0;
    while (i < input.length) {
        console.log(tokenList[i]);
        let char = input[i];
        if (char === '(' || char === ')') {
            tokenList.push(
                {
                    type: 'paren',
                    value: char 
                }
            );
            i++;
            continue;
        }
        
        if (LETTERS.test(char)) {
            let value = '';
            while (LETTERS.test(char)) {
                value += char;
                char = input[++i];
            }
            console.log(value);
            if (RESERVEDWORDS.includes(value)) {
                tokenList.push(
                    {
                        type: 'keyword',
                        value 
                    }
                )
            }
            else {
                tokenList.push(
                    {
                        type: 'identifier',
                        value
                    }
                );
            }
            continue;
        }

        if (WHITESPACE.test(char)) {
            i++;
            continue;
        }

        if (NUMBERS.test(char)) {
            let value = '';
            while (NUMBERS.test(char)) {
                value += char;
                char = input[++i];
            }

            tokenList.push(
                {
                    type: 'number',
                    value
                }
            );
            continue;
        }
        /*
            .
            .
            .
             more token conversion condition checks
            .
            .
            .
        */

        throw new TypeError(`NameError: ${char}`);
    }
    return tokenList;
}


i have tried figuring out if it is a problem with the regex or it is with some other part of the code, but to no avail. the token conversion is successful but i get a Fatal error in , line 0 Fatal JavaScript invalid size error 169220804 which i cannot figure out why.


Solution

  • // regex tests
    const LETTERS = /[-_a-zA-Z0-9]/;
    const WHITESPACE = /\s/;
    const NUMBERS = /\d/;
    const RESERVEDWORDS = ['write-line', ';', 'write']; // ... is other reserved words that i did not list out
    
    // lexing
    function tokenizer(input) {
      const tokenList = [];
      let i = 0;
      while (i < input.length) {
        console.log(tokenList[i]);
        let char = input[i];
        if (char === '(' || char === ')') {
          tokenList.push({
            type: 'paren',
            value: char
          });
          i++;
          continue;
        }
    
        if (char && LETTERS.test(char)) {
          let value = '';
          while (char && LETTERS.test(char)) {
            value += char;
            char = input[++i];
          }
          if (RESERVEDWORDS.includes(value)) {
            tokenList.push({
              type: 'keyword',
              value
            })
          } else {
            tokenList.push({
              type: 'identifier',
              value
            });
          }
          continue;
        }
    
        if (char && WHITESPACE.test(char)) {
          i++;
          continue;
        }
    
        if (char && NUMBERS.test(char)) {
          let value = '';
          while (char && NUMBERS.test(char)) {
            value += char;
            char = input[++i];
          }
    
          tokenList.push({
            type: 'number',
            value
          });
          continue;
        }
        /*
            .
            .
            .
             more token conversion condition checks
            .
            .
            .
        */
    
        throw new TypeError(`NameError: ${char}`);
      }
      return tokenList;
    }
    
    console.log(tokenizer('test'));

    The error you are seeing is when you try to an array index that is too high for the platform. The reason why you are getting this error is because of the lines like input[++i]. i can become so big that it crashes your script.

    Now, the reason it gets so big is here:

    while (LETTERS.test(char)) {
      value += char;
      char = input[++i];
    }
    

    When i becomes bigger than the input's length, char becomes undefined. And the RegExp.test() method always coerces the parameter to string. undefined becoming "undefined" which matches with LETTERS so it loops indefinitely, until an error stops it.

    I've simply added checks in your code to make sure char holds a value, and the function returns correctly.