I've been thinking about how one would go to implement this, but I've found no really good answer.
Essentially, the problem I've run into is that given an arbitrarily large dimension for an array, there would have to be an equal amount of Lexer States to keep track of the depth, so this got me thinking about how expressive Table-Driven Lexers can really be.
It's obvious to me that DFAs cannot count. So since DFAs (Discrete Finite Automatas) cannot count, at least not beyond a hard-coded limit, how do TDLs support multi-dimensional arrays?
Do languages have a hard-cap on how many nested arrays you can have, or do they use hand-made Lexers to account for the need to count?
I assume that when you talk about lexing arrays, you're talking about array literals like [x, y]
or array initializers like {x, y}
. Neither of these would be lexed as a single token. They'd be lexed as [
, x
, ,
, y
, ]
or {
, x
, ,
, y
, }
respectively. That applies to multi-dimensional arrays as well as one-dimensional ones.
Note that if you were to try to turn them into a single token, you'd run into problems even with single-dimensional arrays. Since they can usually contain arbitrary expressions, recognizing them as a single token would mean that the lexer has to parse arbitrary exceptions, which also involves arbitrary nesting and is clearly out of the lexer's scope anyway.