Search code examples
pythonparsingpyparsing

Pyparser how to parse similar looking delimited and non delimited strings


How can I parse the two types of strings below using two separate parsers - one for each pattern?

from pyparsing import *    
dd = """
  wire         c_f_g;
  wire         cl_3_f_g4;

   x_y abc_d
      (.c_l (cl_dclk_001l),
       .c_h (cl_m1dh_ff),
       .ck     (b_f_1g));

I am able to parse them independently using parsers below (respectively):

# For the lines containing wire
printables_less_semicolon = printables.replace(';','')
wireDef = Literal("wire") + Word( printables)

# For the nested pattern
instanceStart = Word( printables ) + Word( printables_less_semicolon )
u = nestedExpr(opener="(", closer=")", ignoreExpr=dblSlashComment)
t = OneOrMore(instanceStart + u + Word( ";" ) + LineEnd())
print instanceStart.parseString(dd)

If you run the above code, theinstanceStart parser matches the wire line. How can I reliably differentiate between the two?


Solution

  • I have a solution that works (definitely not the best).

        printables_less_semicolon = printables.replace(';','')
    
    bracketStuff    = Group(QuotedString("(", escChar=None, multiline=True, endQuoteChar=");"))
    ifDef           = Group(QuotedString("`ifdef", endQuoteChar="`endif", multiline=True))
    theEnd          = Word( "endmodule" )
    nestedConns     = Group(nestedExpr(opener="(", closer=")", ignoreExpr=dblSlashComment))
    instance        = Regex('[\s?|\r\n?].*\(')
    othersWithSc    = Group(Word (printables) + Word (printables_less_semicolon) + Literal(";"))
    othersWithoutSc = Word (printables) + Word (printables_less_semicolon) + NotAny(Literal(";"))
    

    A combination of the above parsers allow me to parse the files in the format that I am dealing with. An example input:

    ts2 = """
    module storyOfFox ( andDog, 
            JLT);
      input andDog;
      output JLT;
    
       `ifdef quickFox
     `include "gatorade"
     `include "chicken" 
    `endif 
    
       wire         hello;
       wire         and;
       wire         welcome;
    
    
       the quick
          (.brown (fox),
           .jumps (over),
           .the (lazy),
           .dog    (and),
           .the (dog),
           .didNot    (likeIt));
    
        theDog thenWent
          (// Waiver unused
           .on (),
           // Waiver unused
           .to (),
           .sueThe (foxFor),
           .jumping (andBeingTooQuick),
           .TheDog    (wasHailedAsAHero),
           .endOf (Story));
    
    
      endmodule
    """
    

    Parser that works to parse the above:

    try:
        tp  = othersWithoutSc + Optional(bracketStuff) + Optional(ZeroOrMore(othersWithSc)) + Optional( Group( ZeroOrMore( othersWithoutSc + nestedConns ) ) ) + theEnd
        tpI = Group( ZeroOrMore( othersWithoutSc + nestedConns +  Word( ";" ) ) )
        tpO = Each( [Optional(ZeroOrMore(othersWithSc)), Optional(ifDef)] )
        tp  = othersWithoutSc + Optional(bracketStuff) + tpO + Group(tpI) + theEnd
        #print othersWithoutSc.parseString("input xyz;")
        print tp.parseString(ts2)
    except ParseException as x:
        print "Line {e.lineno}, column {e.col}:\n'{e.line}'".format(e=x)
    

    Output obtained:

    module
    storyOfFox
    [' andDog, \n        JLT']
    ['input', 'andDog', ';']
    ['output', 'JLT', ';']
    [' quickFox\n `include "gatorade"\n `include "chicken" \n']
    ['wire', 'hello', ';']
    ['wire', 'and', ';']
    ['wire', 'welcome', ';']
    [['the', 'quick', [['.brown', ['fox'], ',', '.jumps', ['over'], ',', '.the', ['lazy'], ',', '.dog', ['and'], ',', '.the', ['dog'], ',', '.didNot', ['likeIt']]], ';', 'theDog', 'thenWent', [['// Waiver unused', '.on', [], ',', '// Waiver unused', '.to', [], ',', '.sueThe', ['foxFor'], ',', '.jumping', ['andBeingTooQuick'], ',', '.TheDog', ['wasHailedAsAHero'], ',', '.endOf', ['Story']]], ';']]
    endmodule
    

    I do not want to accept this answer just yet as I might not have solved the real problem that I was facing before. i just found a way to get around it and get the output I need.