lexical analyser generator in kernel space

Kernel Modular debugger (kmdb) uses lex to generate lexical analyser to use from kernel space [1]. There are some hacks, some of them even are not POSIX compatible.

I have questions:

What are pitfalls in using lex in kernel mode?
It there any reasonable way to adopt [1] for flex [2]?
What are alternatives to lex/flex for kernel space?

(In the last resort I'm going to build and use illumos' lex, but I really want to avoid it)

[1] https://github.com/illumos/illumos-gate/blob/master/usr/src/cmd/mdb/common/mdb/mdb_lex.l

[2] https://github.com/westes/flex/

Solution

You should be able to use hacks like the ones in the illumos lex file in order to wrest control of I/O from flex's. Or you can just use flex's string buffer feature. See yy_scan_string and yy_scan_buffer in the flex manual.

yy_scan_string will cause flex to copy the string, which may be necessary because flex modifies the contents of the buffer as it proceeds. If you don't care about that, and you want to avoid the copy and you are in a position to put two NULs at the end of the input instead of just one, then you can use yy_scan_buffer

There is also a section in the flex manual about how to provide your own memory allocation functions ("Overriding The Default Memory Management"), which is probably also necessary. Flex doesn't allocate much memory other than buffers, and if you are providing your own buffer, you can make Flex's buffer size arbitrarily small. That should make it possible to allocate the memory out of a fixed-length byte array; I don't know how small you can make it, but I would guess that you should be able to get it down to a couple of hundred bytes if you're careful.