Search code examples
pythonregexregex-lookaroundslookbehind

Python regex: Lookbehind + Lookahead with characterset


I would like to get the string 10M5D8P into a dictionary:

M:10, D:5, P:8 etc. ...

The string could be longer, but it's always a number followed by a single letter from this alphabet: MIDNSHP=X

As a first step I wanted to split the string with a lookbehind and lookahead, in both cases matching this regex: [0-9]+[MIDNSHP=X]

So my not working solution looks like this at the moment:

import re

re.compile("(?<=[0-9]+[MIDNSHP=X])(?=[0-9]+[MIDNSHP=X])").split("10M5D8P")

It gives me an error message that I do not understand: "look-behind requires fixed-width pattern"


Solution

  • You may use re.findall.

    >>> import re
    >>> s = "10M5D8P"
    >>> {i[-1]:i[:-1] for i in re.findall(r'[0-9]+[MIDNSHP=X]', s)}
    {'M': '10', 'P': '8', 'D': '5'}
    >>> {i[-1]:int(i[:-1]) for i in re.findall(r'[0-9]+[MIDNSHP=X]', s)}
    {'M': 10, 'P': 8, 'D': 5}
    

    Your regex won't work because re module won't support variable length lookbehind assertions. And also it won't support splitting on zero width boundary, so this (?<=\d)(?=[A-Z]) also can't be possible.