Objective
: to extract each paragraph from from a large text file and store it in .csv file. new line("\n") is acting as a delimiter.
this is code im using:
import csv
input_file = open('path', 'r')
output_file = open('path', 'a+')
writer = csv.writer(output_file)
list = []
for line in input_file:
if line != "\n":
list.append(line)
else:
writer.writerow(list)
list.clear()
the goal here is to parse & append each line into a list until we encounter a "\n"
and store the content present in the list into a single cell
in the .csv file.
code is working fine, but due to some reason, each line
is been printed in separate column/cell instead of printing the entire paragraph
into a single cell.
expected output:
row 1:
this is stackoverflow
python language is used.
current output:
row 1:
this is stackoverflow | python language is used.
what am I missing here?
writerow
method writes the contents in a single row, i.e. each element in a separate column in the same row.
Wouldn't this do the trick for you?
import csv
with open(input_path, 'r') as f:
paragraphs = f.read().split('\n\n')
with open(output_path, 'w') as f:
writer = csv.writer(f)
writer.writerows(enumerate(paragraphs))