Search code examples
pythonpyparsing

Conditional SkipTo+Optional match


I am trying to parse some .LIB files using pyparsing. I have a scenario where I have some string structures that follow a similar layout, however there are variants inside that can change the required grammar.

TL;DR of issue: I need to be able to bypass portions of a string to the next token that could Optionally not be there.

Here is a snippet of a LIB file.

PIN EXAMPLE WITH NO TIMING
pin (core_c_sysclk ) { 
  clock : true ; 
  direction : input ;
  capacitance :  0.0040;
  max_transition :  0.1000;
  related_ground_pin :   "vss" ;
  related_power_pin :   "vcc" ;
  fanout_load :  1.0000;
  min_pulse_width_low :  0.1853;
  min_pulse_width_high :  0.1249;

} /* End of pin core_c_sysclk */

bus (core_tx_td ){
  bus_type :  bus2 ;

  /* Start of pin core_tx_td[9] */ 
  PIN EXAMPLE WITH  TIMING
  pin (core_tx_td[9] ) { 
    direction : output ;
    capacitance :  0.0005;
    max_transition :  0.1000;
    related_ground_pin :   "vss" ;
    related_power_pin :   "vcc" ;
    max_fanout :  15.0000;
    max_capacitance :  0.1000;

    /* Start of rising_edge arc of pin core_tx_td[9] wrt pin core_tx_tclk */
    timing() {                        <----WHAT I WANT (to know if this is in the pin)
      timing_type : rising_edge ;
      timing_sense : non_unate ;
      min_delay_arc :   "true" ;
      related_pin :" core_tx_tclk ";  <----WHAT I WANT (core_tx_tclk in this case)
    rise_transition (lut_timing_4 ){
       values(\
        REMOVED FOR CLARITY
        );
      }
    fall_transition (lut_timing_4 ){
       values(\
        REMOVED FOR CLARITY
        );
      }
    cell_rise (lut_timing_4 ){
       values(\
        REMOVED FOR CLARITY
        );
      }
    cell_fall (lut_timing_4 ){
       values(\
        REMOVED FOR CLARITY
        );
      }
    } /* End of rising_edge arc of pin core_tx_td[9] wrt pin core_tx_tclk */
    .....More but not really needed for example

The main values of interests are the 'pin' name, clock type, direction, and IF the timing() exists, the related pin.

So far, here is what I have for parsing the strings:

LP          = '('
RP          = ')'
LCB         = '{'
RCB         = '}'
COM         = ','


#Pins/Signals
pin_dec       = (Keyword('pin') + LP + Word(alphanums+'_/[]').setResultsName('name') + RP).setResultsName('pin_dec')
pin_clk       = (Keyword('clock') + ':' + Word(alphanums+'_/').setResultsName('is_clk') + ';').setResultsName('pin_clk')
pin_dir       = (Keyword('direction') + ':' + Word(alphanums+'_/').setResultsName('dir') + ';').setResultsName('pin_dir')
pin_arc       = (Keyword('related_pin') + ':' + '"' + Word(alphanums+'_/[]').setResultsName('name') + '"' + ';').setResultsName('pin_arc')
pin_timing    = (Keyword('timing') + LP + RP + LCB + SkipTo(pin_arc) + Optional(pin_arc)).setResultsName('pin_timing')
pin_end       =  Keyword('} /* End of pin') + SkipTo('*/')
pin           = pin_dec + LCB + Optional(pin_clk) + Optional(pin_dir) + SkipTo(Optional(pin_timing))  + SkipTo(pin_end) + pin_end

The pin (), clock check, and direction check are straightforward and seem to work. My issue is with the pin_timing and pin_arc check. In some cases, as seen in the code, you can have additional lines of information that are not needed. I tried to used SkipTo(pin_timing), however it's possible that the pin_timing element could not be there, so I would want to skip it if possible.

I have tried to do an Optional(SkipTo(pin_timing)) and SkipTo(Optional(pin_timing)), but neither of these seem to give me the proper results. Here is a snippet of the code to test out the example string:

for bla in pin.searchString(test_str):
  print('========')
  print('Pin name: ' + bla.pin_dec.name)
  if bla.pin_dir:
    print('Pin Dir: ' + bla.pin_dir.dir)
  if bla.pin_clk:
    print('Pin Clk: ' + bla.pin_clk.is_clk)
  #if bla.pin_timing: just trying to print for debug
  print('Pin Timing: ' + bla.pin_timing)

Output is the following:

========
Pin name: core_c_sysclk
Pin Dir: input
Pin Clk: true
Pin Timing: 
========
Pin name: core_tx_pwr_st[2]
Pin Dir: output
Pin Timing: 
========
Pin name: core_tx_pwr_st[1]
Pin Dir: output
Pin Timing: 
========
Pin name: core_tx_pwr_st[0]
Pin Dir: output
Pin Timing: 
========
Pin name: core_tx_td[9]
Pin Dir: output
Pin Timing: 

Setting debug on the pin_timing (using pin_timing.setDebug()), I get the following output:

Match {"timing" "(" ")" "{" SkipTo:({"related_pin" ":" """ W:(abcd...) """ ";"}) [{"related_pin" ":" """ W:(abcd...) """ ";"}]} at loc 596(22,7)
Exception raised:Expected "timing" (at char 596), (line:22, col:7)

Based on this, it is raising the exception on the max_transition line. I haven't been able to understand why it's doing this. Also wondering why it doesn't give the same exception on the capacitance line. I'm guessing that I am either using Optional + SkipTo incorrectly, so if there is any example that could be used to skip to an optional token, and bypass if not available, that would be nice to see. I have looked through the PyParsing docs and several SO topics, however most of those didn't seem to answer this particular question.

I have wondered if I need to get the entire pin() string from the file and then perform a recursive parse/search to extract the timing/related_pin, however I was going to see if there was an easier solution before trying that.

Thanks


Solution

  • Optional and SkipTo usually require a little care when used together. SkipTo generally looks for its target expression without considering what other expressions come before or after it in a parser.

    Here is an example. Using SkipTo to parse these lines:

    a b c z
    a d e 100 d z
    

    Beginning with 'a', ending with 'z', and some intervening alphas, and possibly an integer.

    We can write this as:

    start = pp.Char('a').setName('start')
    end = pp.Char('z').setName('end')
    num = pp.Word(pp.nums).setName('num')
    

    And we'll use SkipTo because who knows what else might be in there?

    expr = (start
            + pp.Optional(pp.SkipTo(num) + num)
            + pp.SkipTo(end)
            + end)
    

    Throw some tests at it:

    expr.runTests("""
        a b c z
        a d e 100 d z
        a 100 b d z
        """)
    

    And they all look pretty good:

    a b c z
    ['a', 'b c ', 'z']
    
    a d e 100 d z
    ['a', 'd e ', '100', 'd ', 'z']
    
    a 100 b d z
    ['a', '', '100', 'b d ', 'z']
    

    But if there can be multiple exprs, then SkipTo might skip too much:

    pp.OneOrMore(pp.Group(expr)).runTests("""
        a b c z
        a d e 100 d z
        a 100 b d z
    
        # not what we want
        a b c z a d e 100 d z
        """)
    

    Gives:

    a b c z
    [['a', 'b c ', 'z']]
    [0]:
      ['a', 'b c ', 'z']
    
    a d e 100 d z
    [['a', 'd e ', '100', 'd ', 'z']]
    [0]:
      ['a', 'd e ', '100', 'd ', 'z']
    
    a 100 b d z
    [['a', '', '100', 'b d ', 'z']]
    [0]:
      ['a', '', '100', 'b d ', 'z']
    
    # not what we want
    a b c z a d e 100 d z
    [['a', 'b c z a d e ', '100', 'd ', 'z']]
    [0]:
      ['a', 'b c z a d e ', '100', 'd ', 'z']
    

    The last test string shows SkipTo skipping right past the end of the first group until it hits '100' in the second group, and we only get one big group instead of two.

    We need to indicate to SkipTo that it can't read past the end of the group looking for num. To do this, use failOn:

    expr = (start
            + pp.Optional(pp.SkipTo(num, failOn=end) + num)
            + pp.SkipTo(end)
            + end)
    

    We want the skipping to fail if it hits the end expression before finding a num. Since we've said this is optional, it's no problem, and now our test looks like:

    pp.OneOrMore(pp.Group(expr)).runTests("""
        # better
        a b c z a d e 100 d z
        """)
    
    # better
    a b c z a d e 100 d z
    [['a', 'b c ', 'z'], ['a', 'd e ', '100', 'd ', 'z']]
    [0]:
      ['a', 'b c ', 'z']
    [1]:
      ['a', 'd e ', '100', 'd ', 'z']
    

    Now looking at your example, here is your grammar. I made some changes, mostly changing expr.setResultsName("some_name") to expr("some_name") and Grouped your expressions so that your hierarchical naming works, bot mostly, adding failOn in your optional SkipTo so that it won't skip past the pin_end expression:

    identifier    = Word(alphanums+'_/[]')
    pin_dec       = Group(Keyword('pin') + LP + identifier('name') + RP)('pin_dec')
    pin_clk       = Group(Keyword('clock') + ':' + identifier('is_clk') + ';')('pin_clk')
    pin_dir       = Group(Keyword('direction') + ':' + identifier('dir') + ';')('pin_dir')
    pin_arc       = Group(Keyword('related_pin') 
                          + ':' 
                          + '"' + identifier('name') + '"' 
                          + ';')('pin_arc')
    pin_timing    = Group(Keyword('timing') 
                          + LP + RP 
                          + LCB 
                          + SkipTo(pin_arc) 
                          + Optional(pin_arc))('pin_timing')
    pin_end       = RCB + Optional(cStyleComment)
    pin           = Group(pin_dec 
                          + LCB 
                          + Optional(pin_clk) 
                          + Optional(pin_dir) 
                          + Optional(SkipTo(pin_timing, failOn=pin_end))
                          + SkipTo(pin_end) 
                          + pin_end
    
    for parsed in pin.searchString(sample):
        print(parsed.dump())
        print()
    

    Giving:

    [[['pin', '(', 'core_c_sysclk', ')'], '{', ['clock', ':', 'true', ';'], ['direction', ':', 'input', ';'], 'capacitance :  0.0040;\n  max_transition :  0.1000;\n  related_ground_pin :   "vss" ;\n  related_power_pin :   "vcc" ;\n  fanout_load :  1.0000;\n  min_pulse_width_low :  0.1853;\n  min_pulse_width_high :  0.1249;', '', '}', '/* End of pin core_c_sysclk */']]
    [0]:
      [['pin', '(', 'core_c_sysclk', ')'], '{', ['clock', ':', 'true', ';'], ['direction', ':', 'input', ';'], 'capacitance :  0.0040;\n  max_transition :  0.1000;\n  related_ground_pin :   "vss" ;\n  related_power_pin :   "vcc" ;\n  fanout_load :  1.0000;\n  min_pulse_width_low :  0.1853;\n  min_pulse_width_high :  0.1249;', '', '}', '/* End of pin core_c_sysclk */']
      - pin_clk: ['clock', ':', 'true', ';']
        - is_clk: 'true'
      - pin_dec: ['pin', '(', 'core_c_sysclk', ')']
        - name: 'core_c_sysclk'
      - pin_dir: ['direction', ':', 'input', ';']
        - dir: 'input'
    
    [[['pin', '(', 'core_tx_td[9]', ')'], '{', ['direction', ':', 'output', ';'], 'capacitance :  0.0005;\n    max_transition :  0.1000;\n    related_ground_pin :   "vss" ;\n    related_power_pin :   "vcc" ;\n    max_fanout :  15.0000;\n    max_capacitance :  0.1000;\n\n    /* Start of rising_edge arc of pin core_tx_td[9] wrt pin core_tx_tclk */\n    ', 'timing() {                        <----WHAT I WANT (to know if this is in the pin)\n      timing_type : rising_edge ;\n      timing_sense : non_unate ;\n      min_delay_arc :   "true" ;\n      related_pin :" core_tx_tclk ";  <----WHAT I WANT (core_tx_tclk in this case)\n    rise_transition (lut_timing_4 ){\n       values(        REMOVED FOR CLARITY\n        );\n      ', '}']]
    [0]:
      [['pin', '(', 'core_tx_td[9]', ')'], '{', ['direction', ':', 'output', ';'], 'capacitance :  0.0005;\n    max_transition :  0.1000;\n    related_ground_pin :   "vss" ;\n    related_power_pin :   "vcc" ;\n    max_fanout :  15.0000;\n    max_capacitance :  0.1000;\n\n    /* Start of rising_edge arc of pin core_tx_td[9] wrt pin core_tx_tclk */\n    ', 'timing() {                        <----WHAT I WANT (to know if this is in the pin)\n      timing_type : rising_edge ;\n      timing_sense : non_unate ;\n      min_delay_arc :   "true" ;\n      related_pin :" core_tx_tclk ";  <----WHAT I WANT (core_tx_tclk in this case)\n    rise_transition (lut_timing_4 ){\n       values(        REMOVED FOR CLARITY\n        );\n      ', '}']
      - pin_dec: ['pin', '(', 'core_tx_td[9]', ')']
        - name: 'core_tx_td[9]'
      - pin_dir: ['direction', ':', 'output', ';']
        - dir: 'output'
    

    So you really were pretty close, just needed to structure Optional and SkipTo correctly, and add failOn and some Groups. The rest is pretty much the way you had it.