I'm running a series of regexes against blocks of data. We recently upgraded from Activestate perl 5.8 32bit (I know... extremely old!) to perl 5.16 64bit. All the hardware stayed the same (windows).
We are noticing a performance hit where as before our parse loop would take about 2.5 seconds, now it takes about 5 seconds. Can anybody give me a hint as to what would cause the change? I was expecting an increase in performance as my understanding was that the engine had improved greatly, any docs on what I should be doing different would be greatly appreciated.
Yes, the regex engine improved greatly after v8. Alone in v10, we saw:
(*FAIL)
or (*SKIP)
.\K
operatorAlso, more internals were made Unicode-aware.
In v12, the Unicode support was cleaned up. The \p
and \X
operators in regexes are now greatly enhanced.
In v14, the Unicode support was bumped to 6.0. Charnames for the \N
operator were improved (see also charnames
pragma). The new character model can treat any unsigned integer as a codepoint. In the regex engine,
/u
, /d
, /l
, /a
, /aa
./r
was implemented.\p
was cleaned upIn v16, perl almost supports Unicode 6.1. In the regex engine,
\p
charclasses was increased.Obviously, not all of these features come at a price, but especially Unicode-awareness makes internals more complicated, and slower.
You also cannot waive a hand and state that the execution time of a script doubled from perl5 v8 x86 to perl5 v16 x64; there are too many variables:
Basically, you have to compare the whole perl -V
output.
If you are hitting a performance ceiling with regexes, they may be the wrong tool for extensive parsing. At the very least, you may use the newer features to optimize the regexes to eliminate some backtracking.
If your parsing code describes a (roughly) context-free language (i.e. you don't use (?{...})
, (?=...)
or related regex features), and parsing means doing something like generating a tree, then Marpa::R2 might speed things up considerably.