Search code examples
pythonpandasdataframerow

How can I connect specific rows in a Pandas dataframe?


I would like to connect specific rows in a Pandas dataframe.

I have a column „text“ and another column „name“. Each entry of the column „text“ has a string. Some entries of the column „name“ are empty so I would like to connect the row n, that has an empty entry in the column „name“ with the row (n-1). If the row (n-1) has also an empty entry in the column „name“, the rows should connect both to the next row that has an entry in the column „name“.

For example:
Input:

Text=["Abc","def","ghi","jkl","mno","pqr","stu"]

Name=["a","b","c",““,““,"f","g"]

Expected Output:

Text= ["Abc","def","ghijklmno","pqr","stu"]

Name = ["a","b","c","f","g"]

I'd like to make my question more understandable:

I have two lists:

index = [3,6,8,9,10,12,15,17,18,19]
text = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
new = []
for i in range(0,len(text)):
    if i not in index:
        if i+1 not in index:
            new.append(text[i])
    if i in index:
        new.append(text[i-1]+' '+ text[i])

The list index shows the false splits of the text (when column name has no value). Therefore, I'd like to append e.g. text[3] to text[2]. So I'll get a new entry 'c d'.

Finally, the output should be:

new = ['a','b,'c d','e','f g','hijk','lm','n','op','qrst','u','v','w','x','y','z']

These lists are just a simplified example for my large textlist. I don't know how many entries I have to connect together. My algorithm works only when I have to connect an entry n with the entry n-1. But it's also possible that I have to connect the entry n with the entries until n-10, so I get one large entry.

I hope my question is now more understandable.


Solution

  • I have a solution now (the code doesn't look good, but the output is what I expected):

    for i in range(0,len(text)):
        if i not in index:
            if i+1 not in index:
                new.append(text[i])
            elif i+1 in index:
                if i+2 not in index:
                    new.append(text[i]+text[i+1])
                elif i+2 in index:
                    if i+3 not in index:
                        new.append(text[i]+text[i+1]+text[i+2])
                    elif i+3 in index:
                        if i+4 not in index:
                            new.append(text[i]+text[i+1]+text[i+2]+text[i+3])
                        elif i+4 in index:
                            if i+5 not in index:
                                new.append(text[i]+text[i+1]+text[i+2]+text[i+3]+text[i+4])
    
    

    I have to add a few more if conditions... but for the simplified example above, the code works perfectly.