Search code examples
pest

pest grammar failing to parse, I cannot work our why


Simplified grammar down to this

class_var = { kind ~ type ~ name ~ ";" }
kind      = { "static" | "field" }
type      = { "int" | "char" | "bool" | class_name }
class_name = {id}
name       =  { id }
id         =  { ASCII_ALPHA ~ ASCII_ALPHA* }
WHITESPACE = _{ " " | "\t" | "\n" }

trying to parse this (its a field declaration inside a class, it can either be a known type or a user defined class type)

field x f;

produces

 --> 1:10
  |
1 | field x f;
  |          ^---
  |
  = expected id

Works fine with

field int f;

Solution

  • This happens because ASCII_ALPHA matches 'a'..'z' | 'A'..'Z'. So while the first character of f1 is valid for id, the second is not. You likely want to use ASCII_ALPHANUMERIC instead for the remaining characters.

    id = { ASCII_ALPHA ~ ASCII_ALPHANUMERIC* }
    

    Additionally, you should consider making this rule atomic.

    If you want to be extra complete, you might even want to consider using the XID_START and XID_CONTINUE Unicode character groups instead. They were created for this exact purpose and distinguish between all of the non-ascii characters.

    id = @{ ( XID_START ~ XID_CONTINUE* ) | ( "_" ~ XID_CONTINUE+ ) }