Search code examples
rubyparsingtreetop

how to parse multiple lines using ruby treetop?


I am new to ruby and treetop.

I went through this tutorial and came up with the following set of rules.

grammar Sexp

  rule body
    commentPortString *(I am stuck here)*
  end

  rule interface
    space? (intf / intfWithSize) space? ('\n' / end_of_file) <Interface>
  end

  rule commentPortString
    space? '//' space portString space? ('\n' / end_of_file) <CommentPortString>
  end

  rule portString
    'Port' space? '.' <PortString>
  end

  rule expression
    space? '(' body ')' space? <Expression>
  end

  rule intf
    (input / output) space wire:wireName space? ';' <Intf>
  end

  rule intfWithSize
    (input / output) space? width:ifWidth space? wire:wireName space? ';' <IntfWithSize>
  end

  rule input
    'input'
  end

  rule output
    'output'
  end

  rule ifWidth
    '[' space? msb:digits space? ':' space? lsb:digits ']' <IfWidth>
  end

  rule digits
    [0-9]+
  end

  rule integer
    ('+' / '-')? [0-9]+ <IntegerLiteral>
  end

  rule float
    ('+' / '-')? [0-9]+ (('.' [0-9]+) / ('e' [0-9]+)) <FloatLiteral>
  end

  rule string
    '"' ('\"' / !'"' .)* '"' <StringLiteral>
  end

  rule signalTypeString
    '"' if_sig_name:signalType '"' <SignalTypeString>
  end

  rule signalType
    [a-zA-Z] [a-zA-Z0-9_]* (receiveLiteral / transmitLiteral) <SignalType>
  end

  rule receiveLiteral
    '.receive'
  end

  rule transmitLiteral
    '.transmit'
  end

  rule identifier
    [a-zA-Z\=\*] [a-zA-Z0-9_\=\*]* <Identifier>
  end

  rule wireName
    [a-zA-Z] [a-zA-Z0-9_]* <WireName>
  end

  rule non_space
    !space .
  end

  rule space
    [\s\t]+
  end

  rule newLine
    [\n\r]+
  end

  rule end_of_file
    !.
  end

end

I want the parser to extract out blobs such as the one below. It would always start with Port. and end with a blank line.

    // Port.
    output        send;
    input         free;
    output        fgcg;
    output[  2:0] state_id;
    output[  1:0] stream_id;
`ifdef SIMULATION
    output[ 83:0] dbg_id;
`endif

The rules mentioned above can identify all the lines in the text when passed individually but I am unable to extract out the blob. Also I just want to extract out the matching text and ignore the rest.

Can someone point me in the right direction please.


Solution

  • Is something like below along the lines you're looking for. It's hard to understand your problem fully without a little more information.

    The space rule includes \s which includes \n already, so if you're looking for another \n, it won't parse correctly. If you modify the space rule to be [^\S\n]+ it will exclude \n so you can look for it explicitly.

    If you're looking for a completely blank line to end the Port. block, you should look explicitly for "\n" ("\n" / end_of_file).

    Hope that makes sense...

    grammar Sexp
    
      rule body
        commentPortString interface* portEnd
      end
    
      rule interface
        space? (intf / intfWithSize) space? "\n" <Interface>
      end
    
      rule commentPortString
        space? '//' space? portString space? "\n" <CommentPortString>
      end
    
      rule portString
        'Port' space? '.' <PortString>
      end
    
      # Port block ends with a blank line
      rule portEnd
        "\n" / end_of_file
      end
    
      rule expression
        space? '(' body ')' space? <Expression>
      end
    
      rule intf
        (input / output) space wire:wireName space? ';' <Intf>
      end
    
      rule intfWithSize
        (input / output) space? width:ifWidth space? wire:wireName space? ';' <IntfWithSize>
      end
    
      rule input
        'input'
      end
    
      rule output
        'output'
      end
    
      rule ifWidth
        '[' space? msb:digits space? ':' space? lsb:digits ']' <IfWidth>
      end
    
      rule digits
        [0-9]+
      end
    
      rule integer
        ('+' / '-')? [0-9]+ <IntegerLiteral>
      end
    
      rule float
        ('+' / '-')? [0-9]+ (('.' [0-9]+) / ('e' [0-9]+)) <FloatLiteral>
      end
    
      rule string
        '"' ('\"' / !'"' .)* '"' <StringLiteral>
      end
    
      rule signalTypeString
        '"' if_sig_name:signalType '"' <SignalTypeString>
      end
    
      rule signalType
        [a-zA-Z] [a-zA-Z0-9_]* (receiveLiteral / transmitLiteral) <SignalType>
      end
    
      rule receiveLiteral
        '.receive'
      end
    
      rule transmitLiteral
        '.transmit'
      end
    
      rule identifier
        [a-zA-Z\=\*] [a-zA-Z0-9_\=\*]* <Identifier>
      end
    
      rule wireName
        [a-zA-Z] [a-zA-Z0-9_]* <WireName>
      end
    
      rule non_space
        !space .
      end
    
      rule space
        [^\S\n]+
      end
    
      rule newLine
        [\n\r]+
      end
    
      rule end_of_file
        !.
      end
    
    end