Search code examples
pythonstringsplitwhitespacesentence

Split strings based on multiple delimiters while retaining them as well


I am entering 5 sentences and need them to split using multiple delimiters(,/!/?)

Unfortunately while writing the code I considered only letters and put these delimiters and used .split(). It was working fine then.

this was the code:

final_text = ''
split_one = ''
input_text = input("Enter the data: ")
count_d = input_text.count("!") + input_text.count("?") + input_text.count(".")

if count_d == 5:
            final_text = input_text
            final_text = final_text.replace('!', '! ').replace('?', '? ').replace('.', '. ')
            split_one = final_text.split()
            i = 0
            while True:
                print(split_one[i])
                i += 1
                if i == 5:
                    break

For an input of : a.b?c!d.f!

The output was 
a.
b?
c!
d.
f!

But I actually am entering sentences and not letters. E.g

hi.how are you? I am good! what about you?bye!

It gives me:

 hi.
 how
 are
 you?
 I

Instead of

hi.
how are you?
I am good!
what about you?
bye!

What can I do to avoid the split due to white spaces and get it only for the delimiters? (,/./!)

PS: I am not to use any external packages. Version is 3.6


Solution

  • You can use itertools.groupby to split the string by the punctuation, e.g.:

    >>> import itertools as it
    >>> s = 'hi.how are you? I am good! what about you?bye!'
    >>> r = [''.join(v).strip() for k, v in it.groupby(s, lambda c: c in '.!?')]
    >>> r
    ['hi', '.', 'how are you', '?', 'I am good', '!', 'what about you', '?', 'bye', '!']
    >>> for sentence, punct in zip(*[iter(r)]*2):
    ...     print(sentence + punct)
    hi.
    how are you?
    I am good!
    what about you?
    bye!
    

    If you don't care about the punctuation then you can use:

    >>> [''.join(v).strip() for k, v in it.groupby(s, lambda c: c in '.!?') if not k]
    ['hi', 'how are you', 'I am good', 'what about you', 'bye']