Search code examples
javascriptregexcompiler-construction

Get tokens from a JavaScript code


I would like to write a simple internpreter in JavaScript/Node. I got an obstacle when it comes to generate tokens.

var code = 'if (a > 2 && b<4) c = 10;';

code.match(/\W+/g)
// [" (", " > ", " && ", "<", ") ", ";"]

code.match(/\w+/g)
// ["if", "a", "2", "b", "4", "elo"]

As shown, W+ lets me get special characters and w+ lets me get words. I wonder how to get those in one array, something like below:

// ["if", "(", "a", ">", "2", "&&", "b", "<", "4", ")", "c", "=", "10", ";"]

Solution

  • As shown, W+ lets me get special characters and w+ lets me get words. I wonder how to get those in one array, something like below:

    Simply try this

    code.match(/\w+|\W+/g)
    

    gives output as

    ["if", " (", "a", " > ", "2", " && ", "b", "<", "4", ") ", "c", " = ", "10", ";"]
    

    And this will trim the tokens as well

    var tokens = code.match(/\w+|\W+/g).map(function(value){return value.trim()});