Search code examples
cparsingstringbnf

String Manipulation in C


I am helping my nephew for his C lab homework, it is a string manipulation assignment and applying Wang's algorithm.

Here is the BNF representation for the input.

<s> ::= <l> # <r>

<l> ::= <list>| ε
<r> ::= <list>| ε
<list> ::= <f>|<f> , <list>
<f> ::= <letter>| - <f>| (<f><op><f>)
<op> ::= & | | | >
<letter> ::= A |... | Z

What is the best practice to handle and parse this kind of input in C? How can I parse this structure without using struct? Thanks in advance.


Solution

  • The most straightforward approach is to make every rule (or "production") a function. This is called a "recursive descent" parser.

    Write two routine that will peek at and get the next character as well.

    This will give you some code that looks something like this (in pseudocode):

    // <sequent> ::= <lhs> # <rhs>
    sequent()
        lhs()
        if peekchar() != '#' then error
        else poundsign = nextchar()
        rhs()
    
    
    // <lhs> ::= <formulalist>| ε
    lhs()
        if peekchar() == EOF
            return
        else
           formula()
    
    // <rhs> ::= <formulalist>| ε
    rhs()
        if peekchar() == EOF
            return
        else
           formulalist()
    
    // <formulalist> ::= <formula>|<formula> , <formulalist>
    formulalist()
       formula()
       if peekchar() is ','
           comma = nextchar()
           return formulalist()
    
    // <formula> ::= <letter>| - <formula>| (<formula><infixop><formula>)
    formula()
        next = peekchar()
        if next in A..Z
            letter
        else if next is -
            minus_sign = nextchar()
            return formula()
        else
            formula()
            infixop()
            formula()
    
    // <infixop> ::= & | | | >
    infixop()
        c = nextchar()
        if c not in &,|,> then error
    
    // <letter> ::= A | B | ... | Z
    letter()
        c = nextchar()
        if c not A..Z then error
    

    and so forth, for each rule.

    The general idea:

    • each rule is a function
    • at certain points the function peeks ahead to see what to do. for example, formula() checks to see if the first character is a minus sign.