Search code examples
goyacc

Simplest of parsers in go tool yacc


Using this command:

go tool yacc -p Verb -o verb.go boilerplate.y

Trying to build this yacc file:

// boilerplate.y
%{

package main

import (
    "bufio"
    "fmt"
    "os"
    "unicode"
)

%}

%% 

.|\n   ECHO;

%%

func main() {
    fi := bufio.NewReader(os.NewFile(0, "stdin"))
  s, err := fi.ReadString('\n')
  if err != nil {
    fmt.Println('error', err)
  } 

  VerbParse(&VerbLex{s: s})
}

Error: bad syntax on first rule: boilerplate.y:16

Successfully got this example to work:

https://github.com/golang-samples/yacc/blob/master/simple/calc.y

Trying to build my own and work through the lex & yacc book. Resources seem limited to non existent.


Solution

  • You have an incorrect rule in your specifications.

    A specification file has the following declaration:

    declarations
    %%
    rules
    %%
    programs
    

    Where a rule is defined as:

    A  :  BODY  ;
    

    Where A is a non-terminal symbol, while BODY is made up of tokens (terminal symbols), non-terminals and literals. The : and ; are required components of rule declaration syntax.

    Hence the rule:

    .|\n   ECHO;
    

    is syntactically incorrect.

    Since you are simply trying to echo the input, a very simple implementation based on calc.y would be following (file echo.y):

    rules

    %%
    
    in : /* empty */
      | in input '\n'
         { fmt.Printf("Read character: %s\n", $2) }
      ;
    
    input : CHARACTER
      | input CHARACTER
          { $$ = $1 + $2 }
      ;
    

    program

    %%
    
    type InputLex struct {
        // contains one complete input string (with the trailing \n)
        s string
        // used to keep track of parser position along the above imput string
        pos int
    }
    
    func (l *InputLex) Lex(lval *InputSymType) int {
        var c rune = ' '
    
        // skip through all the spaces, both at the ends and in between
        for c == ' ' {
            if l.pos == len(l.s) {
                return 0
            }
            c = rune(l.s[l.pos])
            l.pos += 1
        }
    
        // only look for input characters that are either digits or lower case
        // to do more specific parsing, you'll define more tokens and have a 
        // more complex parsing logic here, choosing which token to return
        // based on parsed input
        if unicode.IsDigit(c) || unicode.IsLower(c) {
            lval.val = string(c)
            return CHARACTER
        }
    
        // do not return any token in case of unrecognized grammer
        // this results in syntax error
        return int(c)
    }
    
    func (l *InputLex) Error(s string) {
        fmt.Printf("syntax error: %s\n", s)
    }
    
    func main() {
        // same as in calc.y
    }
    
    func readline(fi *bufio.Reader) (string, bool) {
        // same as in calc.y
    }
    

    To compile and run this program, do the following at command prompt:

    go tool yacc -o echo.go -p Input echo.y
    go run echo.go
    

    As you can see, you'll have to define your own parsing rules in the Lex method. The struct InputLex is designed to hold the values while your input is being parsed. InputSymType is auto generated and is defined by the %union declared in the declaration part of specification.

    As far as I can tell, there is no way to directly use JISON or a regex to do the matching using go's yacc tool. You may have to take a look at some other libraries.

    More details can be found here: http://dinosaur.compilertools.net/yacc/

    Full working code here: https://play.golang.org/p/u1QxwRKLCl