Search code examples
pythonpython-3.xpandastextread-write

Python writing a string to an maximum of one line in a textile


I have a dataframe, which can be downloaded here. The first column contains a question while the second column contains an answers to that question.

My aim: To create two .txt files, one that contains questions and one that contains answers.

Each questions and answer should be written on a individual row. So that Row 50 in each .txt file contains the 50th question and the 50th answer. (IE that if the files are recombined the questions/answer pairs match up)

The code snippet below opens a textfile, writes each row of the column to that file and removes any \n. It seems to work for about 96% of the rows, but very rarely it writes a single DF row across multiple text lines.

These rare events don't seem to have any defining characteristics, they are not extremely long etc. For the file I attached above, the first one occurs at text file line 395 in the answers column.

f = open("Answers.txt","a", newline="\n",encoding='utf-8')
for i in tqdm(data['answers_body']):
        line =  i.replace('\n','')
        f.write(line)
        f.write("\n")

Interestingly, if I remove the f.write and just print to the console it seems to be work as expected... the issue only occurs during the write process.


Solution

  • Update: full version that resulting 1001 lines

    import csv
    
    data = []
    with open('SO_dataset.csv', 'rb') as csvfile:
        spamreader = csv.reader(csvfile)
        for row in spamreader:
            print ', '.join(row)
            data.append((row[2] if len(row)> 2 else ''))
    
    f = open("Answers.txt", "w")
    i = 0
    for line in data:
        i += 1
        line =  line.replace('\n',' ')
        f.write(str(i) + '. ' + line)
        f.write("\n")
    f.close
    

    Actually, your original code seems fine. If you are talking about the txt file break your line and wrap to next line, that's property of Notepad... If you input them into word or excel, they should be fine without breaking line.