Search code examples
pythondatabasecsvdatasetdata-analysis

Convert .txt file to .csv , where each line goes to a new column and each paragraph goes to a new row


I am relatively new in dealing with txt and json datasets. I have a dialogue dataset in a txt file and i want to convert it into a csv file with each new line converted into a column. and when the next dialog starts (next paragraph), it starts with a new row. so i get data in format of

Header = ['Q1' , 'A1' , 'Q2' , 'A2' .......]

here is the data for reference (this file is in txt format) : dialog data

1 hello hello what can i help you with today
2 may i have a table in a moderate price range for two in rome with italian cuisine i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call italian rome two moderate

1 hi    hello what can i help you with today
2 can you make a restaurant reservation in a expensive price range with british cuisine in rome for eight people    i'm on it
3 <SILENCE> ok let me look into some options for you
4 <SILENCE> api_call british rome eight expensive

1 hi    hello what can i help you with today
2 may i have a table in london with spanish cuisine i'm on it
3 <SILENCE> how many people would be in your party
4 we will be six    which price range are looking for
5 i am looking for a moderate restaurant    ok let me look into some options for you
6 <SILENCE> api_call spanish london six moderate

Solution

  • A CSV file is a list of strings separated by commas, with newlines (\n) separating the rows.

    Due to this simplistic layout, it is often not suitable for containing strings that may contain commas within them, for instance dialogue.

    That being said, with your input file, it is possible to use regex to replace any single newlines with a comma, which effectively does the "each new line converted into a column, each new paragraph a new row" requirement.

    import re
    
    with open('input.txt', 'r') as reader:
        text = reader.read()
    
    text = re.sub(r"(...)\n", r"\1,", text)
    print(text)
    
    with open('output.csv', 'w') as writer:
        writer.write(text)
    
    

    Working example here.