Search code examples
phppythonregexlookbehind

Regular Expression Lookbehind Issue


I'm trying to write a regular expression to pull blocks of text out of the history files I keep on projects I'm building. At the moment I'm planning on doing this extraction manually in my text editor (either textmate or sublimetext 2), but eventually I'll build this into a scripted process using either python or php (haven't decided yet).

All of the history entries in my history file have the format:

YYYY-MM-DD - Chris -- Version: X.X.X
====================================
- Lorem ipsum dolor sit amet, vim id libris epicuri
- Et eos veri quodsi appetere, an qui saepe malorum eloquentiam.
...

--

Where X is the version number that the work was done under.

I'm trying to pull everything from the version number to the final double dash delimiter which denotes the end of the block of text.

I started by creating the regular expression statement to select the section heading which works:

(^[\d]{4}-[\d]{2}-[\d]{2}\s-\s[\w]+\s--\sVersion:\s)[\d\.]+$

But when I try to turn the pattern within my parenthesis into the look behind it fails:

(?<=^[\d]{4}-[\d]{2}-[\d]{2}\s-\s[\w]+\s--\sVersion:\s)[\d\.]+$ 

I've been looking around and so far it seems like this lookbehind format is correct. I can't seem to figure out what I'm missing. Any ideas?


Solution

  • Neither PHP nor Python allow arbitrary-length look-behind. So as soon as you have a quantifier like + in there it ceases to work.

    So your first attempt is the only thing that will work here.