Search code examples
pythonregexpython-re

Split by '.' when not preceded by digit


I want to split '10.1 This is a sentence. Another sentence.' as ['10.1 This is a sentence', 'Another sentence'] and split '10.1. This is a sentence. Another sentence.' as ['10.1. This is a sentence', 'Another sentence']

I have tried

s.split(r'\D.\D')

It doesn't work, how can this be solved?


Solution

  • If you plan to split a string on a . char that is not preceded or followed with a digit, and that is not at the end of the string a splitting approach might work for you:

    re.split(r'(?<!\d)\.(?!\d|$)', text)
    

    See the regex demo.

    If your strings can contain more special cases, you could use a more customizable extracting approach:

    re.findall(r'(?:\d+(?:\.\d+)*\.?|[^.])+', text)
    

    See this regex demo. Details:

    • (?:\d+(?:\.\d+)*\.?|[^.])+ - a non-capturing group that matches one or more occurrences of
      • \d+(?:\.\d+)*\.? - one or more digits (\d+), then zero or more sequences of . and one or more digits ((?:\.\d+)*) and then an optional . char (\.?)
      • | - or
      • [^.] - any char other than a . char.