I want to split '10.1 This is a sentence. Another sentence.'
as ['10.1 This is a sentence', 'Another sentence']
and split '10.1. This is a sentence. Another sentence.'
as ['10.1. This is a sentence', 'Another sentence']
I have tried
s.split(r'\D.\D')
It doesn't work, how can this be solved?
If you plan to split a string on a .
char that is not preceded or followed with a digit, and that is not at the end of the string a splitting approach might work for you:
re.split(r'(?<!\d)\.(?!\d|$)', text)
See the regex demo.
If your strings can contain more special cases, you could use a more customizable extracting approach:
re.findall(r'(?:\d+(?:\.\d+)*\.?|[^.])+', text)
See this regex demo. Details:
(?:\d+(?:\.\d+)*\.?|[^.])+
- a non-capturing group that matches one or more occurrences of
\d+(?:\.\d+)*\.?
- one or more digits (\d+
), then zero or more sequences of .
and one or more digits ((?:\.\d+)*
) and then an optional .
char (\.?
)|
- or[^.]
- any char other than a .
char.