Search code examples
ocamlocamllex

OCamllex matching beginning of line?


I am messing around writing a toy programming language in OCaml with ocamllex, and was trying to make the language sensitive to indentation changes, python-style, but am having a problem matching the beginning of a line with ocamllex's regex rules. I am used to using ^ to match the beginning of a line, but in OCaml that is the string concat operator. Google searches haven't been turning up much for me unfortunately :( Anyone know how this would work?


Solution

  • I'm not sure if there is explicit support for zero-length matching symbols (like ^ in Perl-style regular expressions, which matches a position rather than a substring). However, you should be able to let your lexer turn newlines into an explicit token, something like this:

    parser.mly

    %token EOL
    %token <int> EOLWS
    % other stuff here
    %%
    main:
        EOL stmt                { MyStmtDataType(0, $2) }
      | EOLWS stmt              { MyStmtDataType($1 - 1, $2) }
     ;
    

    lexer.mll

    {
     open Parser
     exception Eof
    }
    rule token = parse
        [' ' '\t']           { token lexbuf }     (* skip other blanks *)
      | ['\n'][' ']+ as lxm  { EOLWS(String.length(lxm)) }
      | ['\n']               { EOL }
      (* ... *)
    

    This is untested, but the general idea is:

    • Treat newlines as staetment 'starters'
    • Measure whitespace that immediately follows the newline and pass its length as an int

    Caveat: you will need to preprocess your input to start with a single \n if it doesn't contain one.