Search code examples
pythonregexstringsplitlookbehind

Python positive-lookbehind split variable-width


I though that I have set up the expression appropriately, but the split is not working as intended.

c = re.compile(r'(?<=^\d\.\d{1,2})\s+');
for header in ['1.1 Introduction', '1.42 Appendix']:
    print re.split(c, header)

Expected result:

['1.1', 'Introduction']
['1.42',  'Appendix']

I am getting the following stacktrace:

Traceback (most recent call last):
     File "foo.py", line 1, in
          c = re.compile(r'(?<=^\d.\d{1,2})\s+');
     File "C:\Python27\lib\re.py", line 190, in compile
          return _compile(pattern, flags)
     File "C:\Python27\lib\re.py", line 242, in _compile
          raise error, v # invalid expression
sre_constants.error: look-behind requires fixed-width pattern
<<< Process finished. (Exit code 1)


Solution

  • Lookbehinds in python cannot be of variable width, so your lookbehind is not valid.

    You can use a capture group as a workaround:

    c = re.compile(r'(^\d\.\d{1,2})\s+');
    for header in ['1.1 Introduction', '1.42 Appendix']:
        print re.split(c, header)[1:] # Remove the first element because it's empty
    

    Output:

    ['1.1', 'Introduction']
    ['1.42', 'Appendix']