Search code examples
pythonwhile-loopwindowing

How to make recurrent list of lists with while-loop


I have a file with the format turn_index \t sentence \t metadata and looks like this, where the length of dialogues (i.e. turns) is variable:

0 hello metadata1
1 hi! metadata2
0 hi there metadata3
1 how are you? metadata4
2 very well meta5
3 I'm so busy today meta6

I would like to group two turns in a list, and group all same-dialogue lists in big list:
[["hello", "hi!"]]
[["hi there", "how are you?"], ["how are you?", "very well"]["very well", "I'm so busy today"]]
My attempt at windowing the sentences two at a time is not working, and I can't even begin figure out how to group per dialogue. My code is the following:

turns = data.readlines()
window_size = 2
i = 0
j = 0
dialogue = []
while i < len(turns) - window_size + 1:
   restart = False
   dialogue=[]
   for turn in turns:
       sec = turn.rstrip().split("\t")
       double_sent = [sec[0], sec[1]]
       i += 1

Solution

  • A solution to fit the edited output. Dialogues will hold all lists of lists you mentioned.

    dialogues = []
    double_sent = []
    for line1, line2 in zip(turns[:-1], turns[1:]):
        if int(line2.split('\t')[0])-int(line1.split('\t')[0]) == 1:
            double_sent.append([line1.split('\t')[1], line2.split('\t')[1]])
        else:
            dialogues.append(double_sent)
            double_sent = []
    dialogues.append(double_sent.copy())
    

    Here

    zip(turns[:-1], turns[1:])
    

    is is a neat expression to always select two subsequent elements of something. This is definitely something useful to remember.

    The next line

    if int(line2.split('\t')[0])-int(line1.split('\t')[0]) == 1
    

    checks whether the turn numbering of the selected lines are following each other. This condition will fail only if you have a switch back to 0, which indicates that a dialogue is finished and can be appended to the dialogues list. If there is an error in the numbering this will give a wrong output.

    # Output
    >>> dialogues
    >>> [[['hello', 'hi!']], [['hi there', 'how are you?'], ['how are you?', 'very well'], ['very well', "I'm so busy today"]]]