Search code examples
pythonfilepython-2.7line-breaks

Why file read/write adds additional lines to the file?


I would like to unescape unicode characters in the source file:

source = open('source.csv', 'r')
target = open('target.csv', 'w')
target.write(source.read().decode('unicode_escape').encode('utf-8'))

But the result file contains extra line breaks. For example, the text

u'\u0417a\u0439\u043c\u044b \u0412ce\u043c \u0436e\u043ba\u044e\u0449\u0438\u043c!\nO\u0434o\u0431\u0440e\u043d\u0438e 98%'

is replaced with

u'Зaймы Вceм жeлaющим!
Oдoбрeниe 98%'

Understand that there is line break symbol \n in the source text, but I would like to keep it as is without actual conversion to line break.


Solution

  • You're almost there:

    for line in source:
        line = line.rstrip('\n')
        line = line.decode('unicode_escape').replace(u'\n', u'\\n').encode('utf8')
        target.write(line + '\n')