I have a file that is just one big string. In this string there are sentences that end with 3 numbers like so:
sees mouse . 1980 1 1 sheep erythrocytes mouse 1980 6 5 seen mouse 1980 8 8
I want to change this so that the file/output looks like this:
sees mouse . 1980 1 1
sheep erythrocytes mouse 1980 6 5
seen mouse 1980 8 8
Here is the code I have been using to try and solve this problem:
with open('ngram_test') as f:
for line in f:
#print(line)
for word in line.split():
print(word)
This, however, only prints each word in the string and a newline. Any help would be greatly appreciated!
Using Regex, you can add newline (\n
) after each pattern occurrence:
import re
s = "sees mouse . 1980 1 1 sheep erythrocytes mouse 1980 6 5 seen mouse 1980 8 8"
pattern = r"(\d{4}\s\d{1,2}\s\d{1,2})"
for match in re.findall(pattern, s):
s = re.sub(match, f'{match}\n', s)
Output:
'sees mouse . 1980 1 1\n sheep erythrocytes mouse 1980 6 5\n seen mouse 1980 8 8\n'