How can I parse the two types of strings below using two separate parsers - one for each pattern?
from pyparsing import *
dd = """
wire c_f_g;
wire cl_3_f_g4;
x_y abc_d
(.c_l (cl_dclk_001l),
.c_h (cl_m1dh_ff),
.ck (b_f_1g));
I am able to parse them independently using parsers below (respectively):
# For the lines containing wire
printables_less_semicolon = printables.replace(';','')
wireDef = Literal("wire") + Word( printables)
# For the nested pattern
instanceStart = Word( printables ) + Word( printables_less_semicolon )
u = nestedExpr(opener="(", closer=")", ignoreExpr=dblSlashComment)
t = OneOrMore(instanceStart + u + Word( ";" ) + LineEnd())
print instanceStart.parseString(dd)
If you run the above code, theinstanceStart
parser matches the wire line. How can I reliably differentiate between the two?
I have a solution that works (definitely not the best).
printables_less_semicolon = printables.replace(';','')
bracketStuff = Group(QuotedString("(", escChar=None, multiline=True, endQuoteChar=");"))
ifDef = Group(QuotedString("`ifdef", endQuoteChar="`endif", multiline=True))
theEnd = Word( "endmodule" )
nestedConns = Group(nestedExpr(opener="(", closer=")", ignoreExpr=dblSlashComment))
instance = Regex('[\s?|\r\n?].*\(')
othersWithSc = Group(Word (printables) + Word (printables_less_semicolon) + Literal(";"))
othersWithoutSc = Word (printables) + Word (printables_less_semicolon) + NotAny(Literal(";"))
A combination of the above parsers allow me to parse the files in the format that I am dealing with. An example input:
ts2 = """
module storyOfFox ( andDog,
JLT);
input andDog;
output JLT;
`ifdef quickFox
`include "gatorade"
`include "chicken"
`endif
wire hello;
wire and;
wire welcome;
the quick
(.brown (fox),
.jumps (over),
.the (lazy),
.dog (and),
.the (dog),
.didNot (likeIt));
theDog thenWent
(// Waiver unused
.on (),
// Waiver unused
.to (),
.sueThe (foxFor),
.jumping (andBeingTooQuick),
.TheDog (wasHailedAsAHero),
.endOf (Story));
endmodule
"""
Parser that works to parse the above:
try:
tp = othersWithoutSc + Optional(bracketStuff) + Optional(ZeroOrMore(othersWithSc)) + Optional( Group( ZeroOrMore( othersWithoutSc + nestedConns ) ) ) + theEnd
tpI = Group( ZeroOrMore( othersWithoutSc + nestedConns + Word( ";" ) ) )
tpO = Each( [Optional(ZeroOrMore(othersWithSc)), Optional(ifDef)] )
tp = othersWithoutSc + Optional(bracketStuff) + tpO + Group(tpI) + theEnd
#print othersWithoutSc.parseString("input xyz;")
print tp.parseString(ts2)
except ParseException as x:
print "Line {e.lineno}, column {e.col}:\n'{e.line}'".format(e=x)
Output obtained:
module
storyOfFox
[' andDog, \n JLT']
['input', 'andDog', ';']
['output', 'JLT', ';']
[' quickFox\n `include "gatorade"\n `include "chicken" \n']
['wire', 'hello', ';']
['wire', 'and', ';']
['wire', 'welcome', ';']
[['the', 'quick', [['.brown', ['fox'], ',', '.jumps', ['over'], ',', '.the', ['lazy'], ',', '.dog', ['and'], ',', '.the', ['dog'], ',', '.didNot', ['likeIt']]], ';', 'theDog', 'thenWent', [['// Waiver unused', '.on', [], ',', '// Waiver unused', '.to', [], ',', '.sueThe', ['foxFor'], ',', '.jumping', ['andBeingTooQuick'], ',', '.TheDog', ['wasHailedAsAHero'], ',', '.endOf', ['Story']]], ';']]
endmodule
I do not want to accept this answer just yet as I might not have solved the real problem that I was facing before. i just found a way to get around it and get the output I need.