Search code examples
pythonstringsplitn-gramsentence

How to reformat a string of sentences to be one sentence per line Python


I have a file that is just one big string. In this string there are sentences that end with 3 numbers like so:

sees mouse . 1980 1 1 sheep erythrocytes mouse 1980 6 5 seen mouse 1980 8 8

I want to change this so that the file/output looks like this:

sees mouse . 1980 1 1

sheep erythrocytes mouse 1980 6 5

seen mouse 1980 8 8

Here is the code I have been using to try and solve this problem:

with open('ngram_test') as f:
for line in f:
    #print(line)
    for word in line.split():
        print(word)

This, however, only prints each word in the string and a newline. Any help would be greatly appreciated!


Solution

  • Using Regex, you can add newline (\n) after each pattern occurrence:

    import re
    s = "sees mouse . 1980 1 1 sheep erythrocytes mouse 1980 6 5 seen mouse 1980 8 8"
    pattern = r"(\d{4}\s\d{1,2}\s\d{1,2})"
    for match in re.findall(pattern, s):
        s = re.sub(match, f'{match}\n', s)
    

    Output:

    'sees mouse . 1980 1 1\n sheep erythrocytes mouse 1980 6 5\n seen mouse 1980 8 8\n'