I am parsing a NWChem output file, with text that looks like:
General Information
-------------------
SCF calculation type: DFT
Wavefunction type: closed shell.
No. of atoms : 10
No. of electrons : 36
Alpha electrons : 18
Beta electrons : 18
Charge : 0
Spin multiplicity: 1
Use of symmetry is: on ; symmetry adaption is: on
Maximum number of iterations: 30
AO basis - number of functions: 95
number of shells: 45
Convergence on energy requested: 1.00D-06
Convergence on density requested: 1.00D-05
Convergence on gradient requested: 5.00D-04
XC Information
--------------
I have saved the file into a string $str
, and replaced each newline with я
.
The above text occurs about 10 times in the file, so I want to capture them all, using something like this to capture General Information
:
my @capture = $str =~ m/General\s+Informationя
\s+[-]+я
(.+(?!\-{2,})) # negative lookahead, no more than 2 "-" characters
яя\s+[-]+
/xg;
The above regex grabs just about the entire file, which is not correct.
I've also tried (.+(?![\-]{2,}))
which also captures way more text than it should.
How do I alter the regex (.+(?!\-{2,}))
so that no more than 2 -
characters are allowed within?
To capture just the General Information
section,
my $gi = /
^
\s* General[ ]Information \n # A line with the header
\s* -{2,} \n # Followed by a separator line.
(?: .* \n (?! \s* -- ) )* # Lines not followed by a separator.
/xm ? $& : undef;
To capture every section separately,
my @sections = /
^
\s* \S.* \n # A line with the header.
\s* -{2,} \n # Followed by a separator line.
(?: .* \n (?! \s* -- ) )* # Lines not followed by a separator.
/xmg;