Suppose I have a string such as this:
"IgotthistextfromapdfIscraped.HowdoIsplitthis?"
And I want to produce:
"I got this text from a pdf I scraped. How do I split this?"
How can I do it?
It turns out that this task is called word segmentation, and there is a python library that can do that:
>>> from wordsegment import load, segment
>>> load()
>>> segment("IgotthistextfromapdfIscraped.HowdoIsplitthis?")
['i', 'got', 'this', 'text', 'from', 'a', 'pdf', 'i', 'scraped', 'how',
'do', 'i', 'split', 'this']