Search code examples
regexperlregex-look-ahead

Negative Lookahead not working in perl regex


I am parsing a NWChem output file, with text that looks like:

    General Information
    -------------------
  SCF calculation type: DFT
  Wavefunction type:  closed shell.
  No. of atoms     :    10
  No. of electrons :    36
   Alpha electrons :    18
    Beta electrons :    18
  Charge           :     0
  Spin multiplicity:     1
  Use of symmetry is: on ; symmetry adaption is: on 
  Maximum number of iterations:  30
  AO basis - number of functions:    95
             number of shells:    45
  Convergence on energy requested:  1.00D-06
  Convergence on density requested:  1.00D-05
  Convergence on gradient requested:  5.00D-04

      XC Information
      --------------

I have saved the file into a string $str, and replaced each newline with я. The above text occurs about 10 times in the file, so I want to capture them all, using something like this to capture General Information:

my @capture = $str =~ m/General\s+Informationя
\s+[-]+я
(.+(?!\-{2,})) # negative lookahead, no more than 2 "-" characters
яя\s+[-]+
/xg;

The above regex grabs just about the entire file, which is not correct.

I've also tried (.+(?![\-]{2,})) which also captures way more text than it should.

How do I alter the regex (.+(?!\-{2,})) so that no more than 2 - characters are allowed within?


Solution

  • To capture just the General Information section,

    my $gi = /
       ^
       \s* General[ ]Information \n  # A line with the header
       \s* -{2,} \n                  # Followed by a separator line.
       (?: .* \n (?! \s* -- ) )*     # Lines not followed by a separator.
    /xm ? $& : undef;
    

    To capture every section separately,

    my @sections = /
       ^
       \s* \S.* \n                   # A line with the header.
       \s* -{2,} \n                  # Followed by a separator line.
       (?: .* \n (?! \s* -- ) )*     # Lines not followed by a separator.
    /xmg;