Search code examples
pythonregexfindall

Specific Python pattern for the string that can help to slice


I am looking for the pattern which helps me to slice a string. The string is something like this:

text = '1. first slice 2. second slice 3. slice number 3 4. the next one
 5 that will not work but belong to no four 5. and this should be 5 and
 so one...'

I want to get this:

  1. first slice
  2. second slice
  3. slice number 3
  4. the next one 5 that will not work but belong to no four
  5. and this should be 5 and so on...

I hope you have got the idea.

What I have examined till now is that I can use this:

import re

parts = re.findall("\d\\. \D+", text)

That works good until it encounter single number. I know that \D expression is non digit, and I tried to use:

parts = re.findall("\d\\. .+,text)

or

parts = re.findall("(\d\\.).*,text)

and many others but I cant find the proper one.

I will be grateful for your help.


Solution

  • You could use a negative lookahead:

    parts = re.findall(r"\d\. (?:\D+|\d(?!\.))*", text)
    

    This matches a digit and dot, followed by anything at all, provided that any digits are not directly followed by a dot.

    Demo:

    >>> import re
    >>> text = '1. first slice 2. second slice 3. slice number 3 4. the next one 5 that will not work but belong to no four 5. and this should be 5 and so one...'
    >>> re.findall(r"\d\. (?:\D+|\d(?!\.))*", text)
    ['1. first slice ', '2. second slice ', '3. slice number 3 ', '4. the next one 5 that will not work but belong to no four ', '5. and this should be 5 and so one...']
    

    Online demo at https://regex101.com/r/kF9jT1/1; to simulate the re.findall() behaviour I added an extra (..) and the g flag.