Search code examples
pythonregexsplit

Split string at every position where an upper-case word starts


What is the best way to split a string like "HELLO there HOW are YOU" by upper-case words?

So I'd end up with an array like such: results = ['HELLO there', 'HOW are', 'YOU']

I have tried:

p = re.compile("\b[A-Z]{2,}\b")
print p.split(page_text)

It doesn't seem to work, though.


Solution

  • I suggest

    l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)
    

    Check this demo.