Applying YACC to GCODE (GRBL)

GCode is language used to tell multi-axis (CNC) robots how to move. It looks like this :

M3 S5000 (Start Spindle Clockwise at 5000 RPM)
G21 (All units in mm)
G00 Z1.000000 (lift Z axis up by 1mm)
G00 X94.720505 Y-14.904622 (Go to this XY coordinate)
G01 Z0.000000 F100.0 (Penetrate at 100mm/m)
G01 X97.298434 Y-14.870127 F400 (cut to here)
G03 X98.003848 Y-14.275867 I-0.028107 J0.749174 (cut an arc)
G00 Z1.000000 (lift Z axis)
etc.

I have layed these commands out in sentences, but each token could be on a separate line. And in fact there are no rules about numbers being concatenated to their respective code letters. Yet I already have a LEX parser which can get me the tokens as described below.

Note that certain commands (M or G codes) have parameters. In the case of M3, it can have an S (spindle speed) parameter. G0 and G1 can have X,Y,Z,F etc. G3 can have X,Y,Z,I,J,R... However each G code does not require ALL those parameters, just one, many or all.

One thing to note here is that we are cutting a single path, then lifting the z axis. That is, we move to a location above the work surface, penetrate, cut a path then lift off. I would call this a 'block' or a 'path' and it is this that I'm interested in.

I need to be able to parse GCode in any messy format and then create a structure of 'blocks', where a block is any series of 'commands' between an z axis down and up.

I can tokenise this language using LEX (python PLY specifically). And get :

type M value 3
type S value 5000
type COMMENT value "Start Spindle Clockwise at 5000 RPM"
type G value 31
type COMMENT value "All unites in mm"
type G value 0
type Z value 1.0
etc.

Now using Lexx I need a rule for a thing called a 'command'.

A command is any comment, or : A 'G' or 'M' code followed by ANY of the appropriate parameter codes (X,Y,Z etc.) Command ends when another command (comment, G or M) is encountered.

Then I need a thing called a 'block', where a block is any set of 'commands' that come after a Z down and before a Z up.

There are 100 G codes and 100 M Codes and 25 parameter codes (A-Z minus G and M)

A rule for 'command' might look like :

command : G F H I J K L S T W X Y Z (how to specify ONE OF)
    | M S F (How to specify one of)
    | COMMENT

And then how would we define block!?

I realise this is a very long post, but if anyone can give me even an idea as to whether YACC can do this? Otherwise I'll just write some code that converts the lex tokens into a tree manually.

Addendum @rici

Thank you for taking the time to understand this question. By way of feedback: My task in full is to get YACC to do the heavy lifting of separating chunks of code into blocks based on different use cases.

For example When 'engraving', often a block will represent a letter or some other shape (in the xy plane). So a block will be defined by the movement of the z axis in and out of the xy plane.

I want to be able to post process blocks:

hatch fill a 'block'. which will involve some fairly complicated calculation of path boundaries, tangents to those boundaries, tool diameter etc. This is the most pressing use case and I haven't a good solution to this yet but I know it can be done because it can be done in Inkscape (vector graphics application)
rotate by n degrees. A fairly simply coordinate translation, I have a solution for this already.
iteratively deepen (extrude). Copy blocks and adjust Z depth on each iteration. Simple.
etc.

Solution

If you just want to ensure that a G command is followed by something, you can do this:

g_modifier: F | H | I | J | K | L | S | T | W | X | Y | Z
m_modifier: S | F
g_command: G g_modifier | g_command g_modifier
m_command: M m_modifier | m_command m_modifier
command: g_command | m_command | COMMENT

If you want to split those into sequences using the presence of a Z modifier, that can be done. You might want the lexer to be able to produce two different Z token types, based on the sign of the argument, because the parser can only make syntax decision based on tokens, not on semantic values.

Your question provides at least two different definitions of a block, making it a bit difficult to provide a clear answer.

"That is, we move to a location above the work surface, penetrate, cut a path then lift off. I would call this a 'block' or a 'path' and it is this that I'm interested in."

That would be, for example:
```
G00 X94.7 Y-14.9 (Move)
G01 Z0.0 (Penetrate)
G01 X97.2 Y-14.8 G03 X98.0 Y-14.2 I-0.02 J0.7 (Path)
G00 Z1.0 (Lift)
```
But later you say, "a block is any set of 'commands' that come after a Z down and before a Z up.

That would be just this part of the previous example:
```
G01 X97.2 Y-14.8 G03 X98.0 Y-14.2 I-0.02 J0.7 (Path)
```

Those are both possible, but obviously different. Here are some possible building blocks:

# This list doesn't include Z words
g_modifier: F | H | I | J | K | L | S | T | W | X | Y
g_command_no_z: G g_modifier
              | g_command_no_z g_modifier

# This doesn't distinguish between Z up and Z down. If you want that to
# affect syntax, you need two different Z tokens, and then two different
# with_z non-terminals.
g_command_with_z: G Z
                | g_command_no_z Z 
                | g_command_with_z g_modifier

# You might or might not want this.
# It's a non-empty sequence of G or M commands with no Z's.
path: command_no_z
    | path command_no_z
command_no_z: COMMENT
            | m_command
            | g_command_no_z