Search code examples
javalexercadjflexcnc

Lexer for CAD NC programs


I'm evaluating the possibilities to track tool movement using nc programs in different formats as input. Using a lexer to tokenize the different program types into a meta layer, where only uniform tools and points, etc. exist seemed like a good idea.

But,

  • I don't know anything about lexical analysis. Is there an easy way to create a lexer? Maybe out of an EBNF?
  • What do you think about my approach, do you see a more viable way to extract the data and support multiple nc file formats?

Additional information

  • The information on which concrete type of NC program is provided is known beforehand.
  • I don't have to check the syntax of the NC program. I assume that they are valid, since they are already used in production.

Solution

  • Creating a lexer could be a useful way to tokenize the input stream of commands. A lexer can usually be generated by giving a lexer generator a set or regular expressions. The lexer will then match your input string using those expressions and give you back the matched text and the token that matched. JFlex would be a reasonable choice for a lexer generator.

    EBNF is used to create parsers, which may or may not be what you need. Parsers are typically built on top of lexers to create a syntax tree out of a token stream. A lexer will not be capable of rules such as "A token A must be followed by token B or C", but a parser will be. There are many different parser generators for java, each with pros and cons. ANTLR is a stable one that you might consider looking into.

    For supporting multiple formats, you'd probably need to generate different lexers, or parsers if you go down that route, and match the text to the right language lexer.