Search code examples
pythonregexcsvcookbook

splitting strings with multiple delimiters in python with re.split (from python cookbook)


In Chapter 2, Section 2.1 of Python Cookbook 3rd Edition, you have the following:

>>> line = 'asdf fjdk; afed, fjek,asdf,      foo'
>>> import re
>>> re.split(r'[;,\s]\s*', line)
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

Yes it is a nice example... but when I try it out with removing the \s* in the regex it still has the same effect... see below:

>>> re.split(r'[;,\s]*', line)
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

So, what does the author have in mind to make the redundant \s* useful for any other purposes than doing it without.. which is more simple and shorter?

Please make ur input.


Solution

  • I don't have the book, so I don't know the authors' intent. But David Beazley is as sharp as they come so I can only guess that it was to distinguish between the output for these two lines.

    >>> line = 'asdf fjdk; afed, fjek,asdf,      foo'
    >>> line = 'asdf fjdk; ; afed, fjek,asdf,      foo'
    

    Using the regex from the book, the second line would be

    ['asdf', 'fjdk', '', 'afed', 'fjek', 'asdf', 'foo']
    

    And using your modified regex

    ['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']
    

    Your regex will collapse all of the symbols in the group [;,\s] that are not separated by a character not in the match group.