Search code examples
f#fsyacc

fslex lexing javascript regular expressions


I am attempting to lex javascript regular exression literals. These start with a "/" and end with a "/" (and sometimes some other modifiers). The issue is that the only way to determine whether it is a regular expression as opposed to a division operator is by reading the tokens previous to the "/" character.

One can read a little more on this here.

As it is, I can't find any documentation on how to get the previous token. Hopefully this is possible and someone can tell me how.

Thanks.


Solution

  • To get around this issue I created a module that keeps track of the last token, and looks in a list of valid tokens to see whether the "/" operator is a division operator or a regex.

    The code is below:

    let mutable lastToken:token = EOF
    
    let setToken token =
        lastToken <- token
        token
    
    let parseDivision (lexbuf:Lexing.lexbuf) (tokenizer:Lexing.LexBuffer<'a> -> JavascriptParser.token) regexer =
        match lastToken.GetType().Name with
        | x when invalidRegexPrefix |> List.contains(x) -> DIVIDE
        | _ -> 
            let result = (regexer lexbuf.StartPos "" lexbuf)
            REGEX(result)
    

    And then inside the lexer I call setToken on the result of the rule. For example:

    | '(' { setToken LPAREN }
    

    setToken both sets the last token and returns the token that has just been set, this was only to make it be less intrusive on the actual lexer code.

    The actual rule for the "/" character is:

    | "/"   { setToken (parseDivision lexbuf token regex) }
    

    One also needs to reset the token to EOF once the parsing is completed or you may be in an inconsistent state (since the last token is a static variable).