OCamllex matching beginning of line?

I am messing around writing a toy programming language in OCaml with ocamllex, and was trying to make the language sensitive to indentation changes, python-style, but am having a problem matching the beginning of a line with ocamllex's regex rules. I am used to using ^ to match the beginning of a line, but in OCaml that is the string concat operator. Google searches haven't been turning up much for me unfortunately :( Anyone know how this would work?

Solution

I'm not sure if there is explicit support for zero-length matching symbols (like ^ in Perl-style regular expressions, which matches a position rather than a substring). However, you should be able to let your lexer turn newlines into an explicit token, something like this:

parser.mly

%token EOL
%token <int> EOLWS
% other stuff here
%%
main:
    EOL stmt                { MyStmtDataType(0, $2) }
  | EOLWS stmt              { MyStmtDataType($1 - 1, $2) }
 ;

lexer.mll

{
 open Parser
 exception Eof
}
rule token = parse
    [' ' '\t']           { token lexbuf }     (* skip other blanks *)
  | ['\n'][' ']+ as lxm  { EOLWS(String.length(lxm)) }
  | ['\n']               { EOL }
  (* ... *)

This is untested, but the general idea is:

Treat newlines as staetment 'starters'
Measure whitespace that immediately follows the newline and pass its length as an int

Caveat: you will need to preprocess your input to start with a single \n if it doesn't contain one.