I am trying to print a list of sentences from a text file (one of the Project Gutenberg eBooks). When I print the file as a single string string it looks fine:
file = open('11.txt','r+')
alice = file.read()
print(alice[:500])
Output is:
ALICE'S ADVENTURES IN WONDERLAND
Lewis Carroll
THE MILLENNIUM FULCRUM EDITION 3.0
CHAPTER I. Down the Rabbit-Hole
Alice was beginning to get very tired of sitting by her sister on the
bank, and of having nothing to do: once or twice she had peeped into the
book her sister was reading, but it had no pictures or conversations in
it, 'and what is the use of a book,' thought Alice 'without pictures or
conversations?'
So she was considering in her own mind (as well as she could, for the
hot d
Now, when I split it into sentences (The assignment was specifically to do this by "splitting at the periods," so it's a very simplified split), I get this:
>>> print(sentences[:5])
["ALICE'S ADVENTURES IN WONDERLAND\n\nLewis Carroll\n\nTHE MILLENNIUM FULCRUM EDITION 3", '0\n\n\n\n\nCHAPTER I', " Down the Rabbit-Hole\n\nAlice was beginning to get very tired of sitting by her sister on the\nbank, and of having nothing to do: once or twice she had peeped into the\nbook her sister was reading, but it had no pictures or conversations in\nit, 'and what is the use of a book,' thought Alice 'without pictures or\nconversations?'\n\nSo she was considering in her own mind (as well as she could, for the\nhot day made her feel very sleepy and stupid), whether the pleasure\nof making a daisy-chain would be worth the trouble of getting up and\npicking the daisies, when suddenly a White Rabbit with pink eyes ran\nclose by her", "\n\nThere was nothing so VERY remarkable in that; nor did Alice think it so\nVERY much out of the way to hear the Rabbit say to itself, 'Oh dear!\nOh dear! I shall be late!' (when she thought it over afterwards, it\noccurred to her that she ought to have wondered at this, but at the time\nit all seemed quite natural); but when the Rabbit actually TOOK A WATCH\nOUT OF ITS WAISTCOAT-POCKET, and looked at it, and then hurried on,\nAlice started to her feet, for it flashed across her mind that she had\nnever before seen a rabbit with either a waistcoat-pocket, or a watch\nto take out of it, and burning with curiosity, she ran across the field\nafter it, and fortunately was just in time to see it pop down a large\nrabbit-hole under the hedge", '\n\nIn another moment down went Alice after it, never once considering how\nin the world she was to get out again']
Where do the extra "\n" characters come from and how can I remove them?
If you want to replace all the newlines with one space, do this:
import re
new_sentences = [re.sub(r'\n+', ' ', s) for s in sentences]