I am trying to parse some .LIB files using pyparsing. I have a scenario where I have some string structures that follow a similar layout, however there are variants inside that can change the required grammar.
TL;DR of issue: I need to be able to bypass portions of a string to the next token that could Optionally not be there.
Here is a snippet of a LIB file.
PIN EXAMPLE WITH NO TIMING
pin (core_c_sysclk ) {
clock : true ;
direction : input ;
capacitance : 0.0040;
max_transition : 0.1000;
related_ground_pin : "vss" ;
related_power_pin : "vcc" ;
fanout_load : 1.0000;
min_pulse_width_low : 0.1853;
min_pulse_width_high : 0.1249;
} /* End of pin core_c_sysclk */
bus (core_tx_td ){
bus_type : bus2 ;
/* Start of pin core_tx_td[9] */
PIN EXAMPLE WITH TIMING
pin (core_tx_td[9] ) {
direction : output ;
capacitance : 0.0005;
max_transition : 0.1000;
related_ground_pin : "vss" ;
related_power_pin : "vcc" ;
max_fanout : 15.0000;
max_capacitance : 0.1000;
/* Start of rising_edge arc of pin core_tx_td[9] wrt pin core_tx_tclk */
timing() { <----WHAT I WANT (to know if this is in the pin)
timing_type : rising_edge ;
timing_sense : non_unate ;
min_delay_arc : "true" ;
related_pin :" core_tx_tclk "; <----WHAT I WANT (core_tx_tclk in this case)
rise_transition (lut_timing_4 ){
values(\
REMOVED FOR CLARITY
);
}
fall_transition (lut_timing_4 ){
values(\
REMOVED FOR CLARITY
);
}
cell_rise (lut_timing_4 ){
values(\
REMOVED FOR CLARITY
);
}
cell_fall (lut_timing_4 ){
values(\
REMOVED FOR CLARITY
);
}
} /* End of rising_edge arc of pin core_tx_td[9] wrt pin core_tx_tclk */
.....More but not really needed for example
The main values of interests are the 'pin' name, clock type, direction, and IF the timing() exists, the related pin.
So far, here is what I have for parsing the strings:
LP = '('
RP = ')'
LCB = '{'
RCB = '}'
COM = ','
#Pins/Signals
pin_dec = (Keyword('pin') + LP + Word(alphanums+'_/[]').setResultsName('name') + RP).setResultsName('pin_dec')
pin_clk = (Keyword('clock') + ':' + Word(alphanums+'_/').setResultsName('is_clk') + ';').setResultsName('pin_clk')
pin_dir = (Keyword('direction') + ':' + Word(alphanums+'_/').setResultsName('dir') + ';').setResultsName('pin_dir')
pin_arc = (Keyword('related_pin') + ':' + '"' + Word(alphanums+'_/[]').setResultsName('name') + '"' + ';').setResultsName('pin_arc')
pin_timing = (Keyword('timing') + LP + RP + LCB + SkipTo(pin_arc) + Optional(pin_arc)).setResultsName('pin_timing')
pin_end = Keyword('} /* End of pin') + SkipTo('*/')
pin = pin_dec + LCB + Optional(pin_clk) + Optional(pin_dir) + SkipTo(Optional(pin_timing)) + SkipTo(pin_end) + pin_end
The pin (), clock check, and direction check are straightforward and seem to work. My issue is with the pin_timing
and pin_arc
check. In some cases, as seen in the code, you can have additional lines of information that are not needed. I tried to used SkipTo(pin_timing), however it's possible that the pin_timing element could not be there, so I would want to skip it if possible.
I have tried to do an Optional(SkipTo(pin_timing))
and SkipTo(Optional(pin_timing))
, but neither of these seem to give me the proper results. Here is a snippet of the code to test out the example string:
for bla in pin.searchString(test_str):
print('========')
print('Pin name: ' + bla.pin_dec.name)
if bla.pin_dir:
print('Pin Dir: ' + bla.pin_dir.dir)
if bla.pin_clk:
print('Pin Clk: ' + bla.pin_clk.is_clk)
#if bla.pin_timing: just trying to print for debug
print('Pin Timing: ' + bla.pin_timing)
Output is the following:
========
Pin name: core_c_sysclk
Pin Dir: input
Pin Clk: true
Pin Timing:
========
Pin name: core_tx_pwr_st[2]
Pin Dir: output
Pin Timing:
========
Pin name: core_tx_pwr_st[1]
Pin Dir: output
Pin Timing:
========
Pin name: core_tx_pwr_st[0]
Pin Dir: output
Pin Timing:
========
Pin name: core_tx_td[9]
Pin Dir: output
Pin Timing:
Setting debug on the pin_timing (using pin_timing.setDebug()
), I get the following output:
Match {"timing" "(" ")" "{" SkipTo:({"related_pin" ":" """ W:(abcd...) """ ";"}) [{"related_pin" ":" """ W:(abcd...) """ ";"}]} at loc 596(22,7)
Exception raised:Expected "timing" (at char 596), (line:22, col:7)
Based on this, it is raising the exception on the max_transition
line. I haven't been able to understand why it's doing this. Also wondering why it doesn't give the same exception on the capacitance
line. I'm guessing that I am either using Optional
+ SkipTo
incorrectly, so if there is any example that could be used to skip to an optional token, and bypass if not available, that would be nice to see. I have looked through the PyParsing docs and several SO topics, however most of those didn't seem to answer this particular question.
I have wondered if I need to get the entire pin()
string from the file and then perform a recursive parse/search to extract the timing/related_pin, however I was going to see if there was an easier solution before trying that.
Thanks
Optional
and SkipTo
usually require a little care when used together. SkipTo
generally looks for its target expression without considering what other expressions come before or after it in a parser.
Here is an example. Using SkipTo
to parse these lines:
a b c z
a d e 100 d z
Beginning with 'a', ending with 'z', and some intervening alphas, and possibly an integer.
We can write this as:
start = pp.Char('a').setName('start')
end = pp.Char('z').setName('end')
num = pp.Word(pp.nums).setName('num')
And we'll use SkipTo
because who knows what else might be in there?
expr = (start
+ pp.Optional(pp.SkipTo(num) + num)
+ pp.SkipTo(end)
+ end)
Throw some tests at it:
expr.runTests("""
a b c z
a d e 100 d z
a 100 b d z
""")
And they all look pretty good:
a b c z
['a', 'b c ', 'z']
a d e 100 d z
['a', 'd e ', '100', 'd ', 'z']
a 100 b d z
['a', '', '100', 'b d ', 'z']
But if there can be multiple exprs, then SkipTo might skip too much:
pp.OneOrMore(pp.Group(expr)).runTests("""
a b c z
a d e 100 d z
a 100 b d z
# not what we want
a b c z a d e 100 d z
""")
Gives:
a b c z
[['a', 'b c ', 'z']]
[0]:
['a', 'b c ', 'z']
a d e 100 d z
[['a', 'd e ', '100', 'd ', 'z']]
[0]:
['a', 'd e ', '100', 'd ', 'z']
a 100 b d z
[['a', '', '100', 'b d ', 'z']]
[0]:
['a', '', '100', 'b d ', 'z']
# not what we want
a b c z a d e 100 d z
[['a', 'b c z a d e ', '100', 'd ', 'z']]
[0]:
['a', 'b c z a d e ', '100', 'd ', 'z']
The last test string shows SkipTo
skipping right past the end of the first group until it hits '100' in the second group, and we only get one big group instead of two.
We need to indicate to SkipTo
that it can't read past the end of the group looking for num. To do this, use failOn
:
expr = (start
+ pp.Optional(pp.SkipTo(num, failOn=end) + num)
+ pp.SkipTo(end)
+ end)
We want the skipping to fail if it hits the end
expression before finding a num
. Since we've said this is optional, it's no problem, and now our test looks like:
pp.OneOrMore(pp.Group(expr)).runTests("""
# better
a b c z a d e 100 d z
""")
# better
a b c z a d e 100 d z
[['a', 'b c ', 'z'], ['a', 'd e ', '100', 'd ', 'z']]
[0]:
['a', 'b c ', 'z']
[1]:
['a', 'd e ', '100', 'd ', 'z']
Now looking at your example, here is your grammar. I made some changes, mostly changing expr.setResultsName("some_name")
to expr("some_name")
and Group
ed your expressions so that your hierarchical naming works, bot mostly, adding failOn
in your optional SkipTo
so that it won't skip past the pin_end
expression:
identifier = Word(alphanums+'_/[]')
pin_dec = Group(Keyword('pin') + LP + identifier('name') + RP)('pin_dec')
pin_clk = Group(Keyword('clock') + ':' + identifier('is_clk') + ';')('pin_clk')
pin_dir = Group(Keyword('direction') + ':' + identifier('dir') + ';')('pin_dir')
pin_arc = Group(Keyword('related_pin')
+ ':'
+ '"' + identifier('name') + '"'
+ ';')('pin_arc')
pin_timing = Group(Keyword('timing')
+ LP + RP
+ LCB
+ SkipTo(pin_arc)
+ Optional(pin_arc))('pin_timing')
pin_end = RCB + Optional(cStyleComment)
pin = Group(pin_dec
+ LCB
+ Optional(pin_clk)
+ Optional(pin_dir)
+ Optional(SkipTo(pin_timing, failOn=pin_end))
+ SkipTo(pin_end)
+ pin_end
for parsed in pin.searchString(sample):
print(parsed.dump())
print()
Giving:
[[['pin', '(', 'core_c_sysclk', ')'], '{', ['clock', ':', 'true', ';'], ['direction', ':', 'input', ';'], 'capacitance : 0.0040;\n max_transition : 0.1000;\n related_ground_pin : "vss" ;\n related_power_pin : "vcc" ;\n fanout_load : 1.0000;\n min_pulse_width_low : 0.1853;\n min_pulse_width_high : 0.1249;', '', '}', '/* End of pin core_c_sysclk */']]
[0]:
[['pin', '(', 'core_c_sysclk', ')'], '{', ['clock', ':', 'true', ';'], ['direction', ':', 'input', ';'], 'capacitance : 0.0040;\n max_transition : 0.1000;\n related_ground_pin : "vss" ;\n related_power_pin : "vcc" ;\n fanout_load : 1.0000;\n min_pulse_width_low : 0.1853;\n min_pulse_width_high : 0.1249;', '', '}', '/* End of pin core_c_sysclk */']
- pin_clk: ['clock', ':', 'true', ';']
- is_clk: 'true'
- pin_dec: ['pin', '(', 'core_c_sysclk', ')']
- name: 'core_c_sysclk'
- pin_dir: ['direction', ':', 'input', ';']
- dir: 'input'
[[['pin', '(', 'core_tx_td[9]', ')'], '{', ['direction', ':', 'output', ';'], 'capacitance : 0.0005;\n max_transition : 0.1000;\n related_ground_pin : "vss" ;\n related_power_pin : "vcc" ;\n max_fanout : 15.0000;\n max_capacitance : 0.1000;\n\n /* Start of rising_edge arc of pin core_tx_td[9] wrt pin core_tx_tclk */\n ', 'timing() { <----WHAT I WANT (to know if this is in the pin)\n timing_type : rising_edge ;\n timing_sense : non_unate ;\n min_delay_arc : "true" ;\n related_pin :" core_tx_tclk "; <----WHAT I WANT (core_tx_tclk in this case)\n rise_transition (lut_timing_4 ){\n values( REMOVED FOR CLARITY\n );\n ', '}']]
[0]:
[['pin', '(', 'core_tx_td[9]', ')'], '{', ['direction', ':', 'output', ';'], 'capacitance : 0.0005;\n max_transition : 0.1000;\n related_ground_pin : "vss" ;\n related_power_pin : "vcc" ;\n max_fanout : 15.0000;\n max_capacitance : 0.1000;\n\n /* Start of rising_edge arc of pin core_tx_td[9] wrt pin core_tx_tclk */\n ', 'timing() { <----WHAT I WANT (to know if this is in the pin)\n timing_type : rising_edge ;\n timing_sense : non_unate ;\n min_delay_arc : "true" ;\n related_pin :" core_tx_tclk "; <----WHAT I WANT (core_tx_tclk in this case)\n rise_transition (lut_timing_4 ){\n values( REMOVED FOR CLARITY\n );\n ', '}']
- pin_dec: ['pin', '(', 'core_tx_td[9]', ')']
- name: 'core_tx_td[9]'
- pin_dir: ['direction', ':', 'output', ';']
- dir: 'output'
So you really were pretty close, just needed to structure Optional
and SkipTo
correctly, and add failOn
and some Group
s. The rest is pretty much the way you had it.